Syllabus: Strategies for Data Modeling with Tidy Models

Objective

Develop a comprehensive understanding of fundamental computing and data analysis concepts necessary for data modeling and machine learning using the R environment and the tidy models ecosystem. Through a practical approach, you will learn how to clean, manage, and analyze data, develop reproducible research workflows, and apply preprocessing, modeling, and model evaluation strategies to solve real-world data science problems.

Course Requirements

  1. Strategies for Working with Data
  2. Introduction to Statistics for Health Researchers
  3. Methods for Developing Statistical Regression Models
  4. Familiarity with linear regression models and model evaluation methods such as k-fold cross-validation and mean squared error (MSE)

Course Content

Module 1: General Introduction and Objectives

  • Presentation of course objectives
  • Introduction to data modeling and machine learning
  • Overview of datasets used in the course

Module 2: Strategies for Creating Reproducible Workflows

  • Importance of reproducibility in research
  • Best practices for organizing projects in R
  • Tools to ensure reproducibility

Module 3: Data Modeling and Machine Learning with tidy models

Introduction to the tidy models Philosophy

  • Principles and advantages of the tidy approach for data modeling
  • Differences between tidy models and other machine learning frameworks

Resampling Infrastructure and Model Evaluation

  • rsample: Resampling techniques for empirical model evaluation
  • Theory and application of cross-validation

Model Construction and Preprocessing

  • parsnip: Unified interface for creating models in R
  • recipes: Modern approach to data engineering and preprocessing
  • Theory and application of data preprocessing

Workflows and Model Comparison

  • workflows: Integrating preprocessing and modeling in a single object
  • workflow_set: Running and evaluating multiple models simultaneously

Model Evaluation and Validation

  • yardstick: Tools for evaluating machine learning models
  • Methods for evaluating regression models
  • Methods for evaluating classification models

Interpretation and Presentation of Results

  • broom: Converting model results into clean and organized tables
  • Report generation and data visualization

Required Installations

References