1 Required skills ✎ Polishing
This book assumes some existing knowledge of data wrangling and visualization in R/tidyverse. Specifically, familiarity with RStudio, reproducible reports written in {quarto} (.qmd) documents, the ‘pipe’ (|> and %>%), the packages {dplyr}, {tidyr}, {forcats}, and {ggplot2}, and the concept of Tidy Data (Wickham, 2014).
If need help with these skills, please see Ian’s other book “Reproducible Data Processing and Visualization in R and tidyverse”.
If you’re enrolled in our class but haven’t already taken Ian’s “Reproducible data processing and visualization” class based on that book, or a comparable class, we encourage you to rapidly make your way through it. In previous years, students have taken this simulation course without much familiarity with R when they already have some familiarity with other coding languages such as Python and Matlab. Sometimes, students sign up for this seminar with low confidence in their R abilities. It is entirely possible to succeed in this course without strong existing R skills, but it unavoidably means more self-guided learning and practice for you.
1.1 Check your skills
Later content in this book relies on you having an understanding of ‘Tidy Data’; the workflow we define and use is built around this concept. Specifically, because most data analysis functions don’t return data in a ‘Tidy’ format, we need to be able to extract their results in Tidy format. Importantly, when learners struggle or make errors when trying to build simulations, it is very often because their workflow is not Tidy.
Tidy Data is a set of technical ideas about how data should be structured defined by Hadley Wickham, the main developer of {tidyverse} (Wickham, 2014).
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.
Ready to test your data wrangling skills? Download and complete the exercises for this chapter.