Syllabus: Strategies for Data Modeling with Tidy Models

Objective

Develop a comprehensive understanding of fundamental computing and data analysis concepts necessary for data modeling and machine learning using the R environment and the tidy models ecosystem. Through a practical approach, you will learn how to clean, manage, and analyze data, develop reproducible research workflows, and apply preprocessing, modeling, and model evaluation strategies to solve real-world data science problems.

Course Requirements

Strategies for Working with Data
Introduction to Statistics for Health Researchers
Methods for Developing Statistical Regression Models
Familiarity with linear regression models and model evaluation methods such as k-fold cross-validation and mean squared error (MSE)

Course Content

Module 1: General Introduction and Objectives

Presentation of course objectives
Introduction to data modeling and machine learning
Overview of datasets used in the course

Module 2: Strategies for Creating Reproducible Workflows

Importance of reproducibility in research
Best practices for organizing projects in R
Tools to ensure reproducibility

Module 3: Data Modeling and Machine Learning with tidy models

**Introduction to the tidy models Philosophy**

Principles and advantages of the tidy approach for data modeling
Differences between tidy models and other machine learning frameworks

Resampling Infrastructure and Model Evaluation

rsample: Resampling techniques for empirical model evaluation
Theory and application of cross-validation

Model Construction and Preprocessing

parsnip: Unified interface for creating models in R
recipes: Modern approach to data engineering and preprocessing
Theory and application of data preprocessing

Workflows and Model Comparison

workflows: Integrating preprocessing and modeling in a single object
workflow_set: Running and evaluating multiple models simultaneously

Model Evaluation and Validation

yardstick: Tools for evaluating machine learning models
Methods for evaluating regression models
Methods for evaluating classification models

Interpretation and Presentation of Results

broom: Converting model results into clean and organized tables
Report generation and data visualization

Required Installations

R Statistical Package
RStudio Integrated Development Environment
tidyverse meta-package
tidy models meta-package

References

Kuhn, M., & Johnson, K. (2020). Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, Taylor & Francis Group. https://bookdown.org/max/FES/
Kuhn, M., & Silge, J. (2022). Tidy Modeling with R. Tidy Modeling with R. https://www.tmwr.org/
Posit. (n.d.). Create tidy data. Tidy Messy Data • tidyr. https://tidyr.tidyverse.org/
Posit. (n.d.). dplyr. A Grammar of Data Manipulation • dplyr. https://dplyr.tidyverse.org/
Posit. (n.d.). Tidymodels. Easily Install and Load the Tidy Models Packages. https://tidymodels.tidymodels.org/
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. R for Data Science: Welcome. https://r4ds.had.co.nz/index.html