Syllabus: Data Management Strategies
Objective
Develop essential skills for data manipulation and management in tabular structures using the R programming language and its main tools for data preparation and transformation. Participants will gain the ability to efficiently import, clean, and process data, ensuring reproducibility and applicability in their professional environments and analytical projects.
Course Content
Module 1: Introduction to Strategies for Working with Data
- Introduction to the R programming language
- Installing packages and user interface
- R statistical package
- RStudio integrated development environment
- Jamovi user interface
- R statistical package
- Packages for additional functionalities
- Overview of datasets used in the course
Module 2: Introduction to Variable Types and Data Structures
- Identifying and classifying variables
- Common data structures in tabular data
Module 3: Strategies for Reproducible Work
- Importance of reproducibility in research
- Ethics and responsible conduct in data management
Module 4: Data Import and Preparation in R
- Techniques for importing data from various sources
- Strategies for cleaning and preparing data
Module 5: Introduction to Variable Types and Data Structures
- Identification and classification of variables
- Common data structures in tabular datasets
Module 6: Strategies for Creating Reproducible Work
- Importance of reproducibility in research
- Ethics and responsible conduct in data management
- Techniques for importing data from various sources
- Strategies for data cleaning and preparation
Module 7: Working with the tidyr
Package
- Concepts of structured data
- Basic transformations for structuring data efficiently
- Defining clean datasets
-
data.frame
vs.tibble
- Subsetting data
- Reshaping data
- Splitting and merging character columns
- Handling missing data
- Managing “list” type data
-
Module 8: Data Manipulation Grammar with dplyr
- Introduction to
dplyr
verbs
- Case handling
- Summarizing case information
- Grouping data
- Summarizing case information
- Variable handling
- Vectorized functions
- Summary functions
- Vectorized functions
- Combining multiple tables
- Applying the grammar in different environments:
- In-memory data
- Database data
- Apache Spark data
- In-memory data
References
Posit. (n.d.). Create tidy data. Tidy Messy Data • tidyr. https://tidyr.tidyverse.org/
Posit. (n.d.). dplyr. A Grammar of Data Manipulation • dplyr. https://dplyr.tidyverse.org/
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. R for Data Science: Welcome. https://r4ds.hadley.nz/