Syllabus: Strategies for Data Modeling with Tidy Models
Objective
Develop a comprehensive understanding of fundamental computing and data analysis concepts necessary for data modeling and machine learning using the R environment and the tidy models ecosystem. Through a practical approach, you will learn how to clean, manage, and analyze data, develop reproducible research workflows, and apply preprocessing, modeling, and model evaluation strategies to solve real-world data science problems.
Course Requirements
- Strategies for Working with Data
- Introduction to Statistics for Health Researchers
- Methods for Developing Statistical Regression Models
- Familiarity with linear regression models and model evaluation methods such as k-fold cross-validation and mean squared error (MSE)
Course Content
Module 1: General Introduction and Objectives
- Presentation of course objectives
- Introduction to data modeling and machine learning
- Overview of datasets used in the course
Module 2: Strategies for Creating Reproducible Workflows
- Importance of reproducibility in research
- Best practices for organizing projects in R
- Tools to ensure reproducibility
Module 3: Data Modeling and Machine Learning with tidy models
Introduction to the tidy models Philosophy
- Principles and advantages of the tidy approach for data modeling
- Differences between tidy models and other machine learning frameworks
Resampling Infrastructure and Model Evaluation
-
rsample
: Resampling techniques for empirical model evaluation
- Theory and application of cross-validation
Model Construction and Preprocessing
-
parsnip
: Unified interface for creating models in R
-
recipes
: Modern approach to data engineering and preprocessing
- Theory and application of data preprocessing
Workflows and Model Comparison
-
workflows
: Integrating preprocessing and modeling in a single object
-
workflow_set
: Running and evaluating multiple models simultaneously
Model Evaluation and Validation
-
yardstick
: Tools for evaluating machine learning models
- Methods for evaluating regression models
- Methods for evaluating classification models
Interpretation and Presentation of Results
-
broom
: Converting model results into clean and organized tables
- Report generation and data visualization
Required Installations
-
R Statistical Package
-
RStudio Integrated Development Environment
-
tidyverse meta-package
- tidy models meta-package
References
Kuhn, M., & Johnson, K. (2020). Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, Taylor & Francis Group. https://bookdown.org/max/FES/
Kuhn, M., & Silge, J. (2022). Tidy Modeling with R. Tidy Modeling with R. https://www.tmwr.org/
Posit. (n.d.). Create tidy data. Tidy Messy Data • tidyr. https://tidyr.tidyverse.org/
Posit. (n.d.). dplyr. A Grammar of Data Manipulation • dplyr. https://dplyr.tidyverse.org/
Posit. (n.d.). Tidymodels. Easily Install and Load the Tidy Models Packages. https://tidymodels.tidymodels.org/
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. R for Data Science: Welcome. https://r4ds.had.co.nz/index.html