Syllabus: Data Management Strategies

Objective

Develop essential skills for data manipulation and management in tabular structures using the R programming language and its main tools for data preparation and transformation. Participants will gain the ability to efficiently import, clean, and process data, ensuring reproducibility and applicability in their professional environments and analytical projects.

Course Content

Module 1: Introduction to Strategies for Working with Data

  • Introduction to the R programming language
  • Installing packages and user interface
    • R statistical package
    • RStudio integrated development environment
    • Jamovi user interface
  • Packages for additional functionalities
  • Overview of datasets used in the course

Module 2: Introduction to Variable Types and Data Structures

  • Identifying and classifying variables
  • Common data structures in tabular data

Module 3: Strategies for Reproducible Work

  • Importance of reproducibility in research
  • Ethics and responsible conduct in data management

Module 4: Data Import and Preparation in R

  • Techniques for importing data from various sources
  • Strategies for cleaning and preparing data

Module 5: Introduction to Variable Types and Data Structures

  • Identification and classification of variables
  • Common data structures in tabular datasets

Module 6: Strategies for Creating Reproducible Work

  • Importance of reproducibility in research
  • Ethics and responsible conduct in data management
  • Techniques for importing data from various sources
  • Strategies for data cleaning and preparation

Module 7: Working with the tidyr Package

  • Concepts of structured data
  • Basic transformations for structuring data efficiently
  • Defining clean datasets
    • data.frame vs. tibble
    • Subsetting data
    • Reshaping data
    • Splitting and merging character columns
    • Handling missing data
    • Managing “list” type data

Module 8: Data Manipulation Grammar with dplyr

  • Introduction to dplyr verbs
  • Case handling
    • Summarizing case information
    • Grouping data
  • Variable handling
    • Vectorized functions
    • Summary functions
  • Combining multiple tables
  • Applying the grammar in different environments:
    • In-memory data
    • Database data
    • Apache Spark data

References

  • Posit. (n.d.). Create tidy data. Tidy Messy Data • tidyr. https://tidyr.tidyverse.org/

  • Posit. (n.d.). dplyr. A Grammar of Data Manipulation • dplyr. https://dplyr.tidyverse.org/

  • Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. R for Data Science: Welcome. https://r4ds.hadley.nz/