Syllabus: Methods for Developing Statistical Regression Models
General Objective
Develop a solid understanding of statistical algorithms applied to regression analysis by teaching methods for developing, evaluating, and selecting statistical models, with a particular focus on continuous response variables. Additionally, practical examples will be implemented in the R environment, providing a theoretical foundation that can be applied in any statistical software package.
Course Requirement
- Install the R statistical package and the RStudio integrated development environment (IDE) on your computer before the start of the course to ensure a smooth learning experience.
Course Content
Module 1: Introduction to Statistical Algorithms and Regression Analysis
- Fundamental concepts of statistical algorithms
- Introduction to linear regression analysis: fundamentals and applications
Module 2: Simple Linear Regression
- Development and application of simple linear regression models
- Evaluation of linear model assumptions
- Verification and validation of fundamental assumptions
- Verification and validation of fundamental assumptions
- Assessing the quality of the regression line
- Techniques to measure model accuracy and fit
- Techniques to measure model accuracy and fit
- Correlation analysis
- Study of relationships between variables
Module 3: Analysis of Variance (ANOVA) and Model Quality
- Methods for assessing variance and its significance in regression models
- Strategies to improve the quality of the regression line
Module 4: Multiple Regression and Model Evaluation
- Development and evaluation of models with multiple predictor variables
- Interpretation of results in a linear regression model
- Validation and diagnostic techniques for regression models
Module 5: Extensions of the Linear Model
- Non-linearity in regression models
- Confounding in statistical models
- Heterogeneity of effect measurement (interaction)
Module 6: Model Selection and Optimization
- Bias-variance trade-off
- Resampling methods for model validation and improvement
- Strategies and criteria for selecting the most appropriate model
Module 7: Non-Linear Models and Variable Transformation
- Introduction and application of non-linear regression models
- Methods for transforming continuous variables to improve model fit
References
Dalgaard, P. (2008). Introductory Statistics with R (Second ed.). Springer. 10.1007/978-0-387-79054-1
Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (Vol. 608). New York: springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023, June 21). An Introduction to Statistical Learning. Trevor Hastie. https://www.statlearning.com/
Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied Regression Analysis and Other Multivariable Methods (D. G. Kleinbaum, L. L. Kupper, A. Nizam, & E. S. Rosenberg, Eds.; 5th ed.). Cengage Learning.
Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press.
Porta, M. S., Greenland, S., Hernán, M., Silva, I. d. S., & Last, J. M. (Eds.). (2014). A Dictionary of Epidemiology (6th ed.). Oxford University Press. 10.1093/acref/9780199976720.001.0001
R Core Team, R. (2013). R: A language and environment for statistical computing. https://apps.dtic.mil/sti/citations/AD1039033
Rosner, B. (2016). Fundamentals of Biostatistics (8th ed.). Cengage Learning. https://www.cengage.com/c/fundamentals-of-biostatistics-8e-rosner/9781305268920/