Syllabus: Methods for Developing Statistical Regression Models

General Objective

Develop a solid understanding of statistical algorithms applied to regression analysis by teaching methods for developing, evaluating, and selecting statistical models, with a particular focus on continuous response variables. Additionally, practical examples will be implemented in the R environment, providing a theoretical foundation that can be applied in any statistical software package.

Course Requirement

  1. Install the R statistical package and the RStudio integrated development environment (IDE) on your computer before the start of the course to ensure a smooth learning experience.

Course Content

Module 1: Introduction to Statistical Algorithms and Regression Analysis

  • Fundamental concepts of statistical algorithms
  • Introduction to linear regression analysis: fundamentals and applications

Module 2: Simple Linear Regression

  • Development and application of simple linear regression models
  • Evaluation of linear model assumptions
    • Verification and validation of fundamental assumptions
  • Assessing the quality of the regression line
    • Techniques to measure model accuracy and fit
  • Correlation analysis
    • Study of relationships between variables

Module 3: Analysis of Variance (ANOVA) and Model Quality

  • Methods for assessing variance and its significance in regression models
  • Strategies to improve the quality of the regression line

Module 4: Multiple Regression and Model Evaluation

  • Development and evaluation of models with multiple predictor variables
  • Interpretation of results in a linear regression model
  • Validation and diagnostic techniques for regression models

Module 5: Extensions of the Linear Model

  • Non-linearity in regression models
  • Confounding in statistical models
  • Heterogeneity of effect measurement (interaction)

Module 6: Model Selection and Optimization

  • Bias-variance trade-off
  • Resampling methods for model validation and improvement
  • Strategies and criteria for selecting the most appropriate model

Module 7: Non-Linear Models and Variable Transformation

  • Introduction and application of non-linear regression models
  • Methods for transforming continuous variables to improve model fit

References

  • Dalgaard, P. (2008). Introductory Statistics with R (Second ed.). Springer. 10.1007/978-0-387-79054-1

  • Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (Vol. 608). New York: springer.

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023, June 21). An Introduction to Statistical Learning. Trevor Hastie. https://www.statlearning.com/

  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied Regression Analysis and Other Multivariable Methods (D. G. Kleinbaum, L. L. Kupper, A. Nizam, & E. S. Rosenberg, Eds.; 5th ed.). Cengage Learning.

  • Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press.

  • Porta, M. S., Greenland, S., Hernán, M., Silva, I. d. S., & Last, J. M. (Eds.). (2014). A Dictionary of Epidemiology (6th ed.). Oxford University Press. 10.1093/acref/9780199976720.001.0001

  • R Core Team, R. (2013). R: A language and environment for statistical computing. https://apps.dtic.mil/sti/citations/AD1039033

  • Rosner, B. (2016). Fundamentals of Biostatistics (8th ed.). Cengage Learning. https://www.cengage.com/c/fundamentals-of-biostatistics-8e-rosner/9781305268920/