18 Exercises for ‘Required skills’ chapter ✎ Polishing

These exercises accompany the Required skills chapter.

Complete the following exercises in your local copy of this .qmd file to check your data tidying skills. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page. Most of them involve extracting estimates from objects created by data simulation or statistical modelling functions.

If you need a refresher, see the chapter on tidy data and reshaping in Ian’s other book.

18.1 Data wrangling

18.1.1 Calculate mean

Use dplyr::summarize().
Use the data_intervention data set.
Return results in a tibble.

18.1.2 Calculate SD

Use dplyr::summarize().
Use the data_intervention data set.
Return results in a tibble.

18.1.3 Calculate mean for each condition

Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
Use the data_for_ttest data set.
Return results in a tibble.

18.1.4 Calculate mean and SD for each condition

Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
Use the data_for_ttest data set.
Return results in a tibble.

18.1.5 Calculate mean and SD for each condition rounded to two decimal places

Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
Use the data_for_ttest data set.
Return results in a tibble.
Round the means and SDs to two decimal places, using the round-half-up method, eg via roundwork::round_up(). Ideally, use mutate_if() or across() to round multiple columns.

18.2 Generate data

18.2.1 Normally distributed data in a tibble

The rnorm() samples data from a normally distributed population with a given population mean ($\mu$) and population standard deviation ($\sigma$).

# set seed for reproducibility
set.seed(42)

rnorm(n = 10, 
      mean = 0, 
      sd = 1)

 [1]  1.37095845 -0.56469817  0.36312841  0.63286260  0.40426832 -0.10612452
 [7]  1.51152200 -0.09465904  2.01842371 -0.06271410

Make this tidier by returning this simulated values as the column score in a tibble. Assign the tibble to the object data_control.

Create a second object, data_intervention, where the observations are sampled from a population mean ($\mu$) of 0.4.

Create a new column called condition in each tibble using the appropriate {dplyr} function, setting it to “control” and “intervention” in the respective tibbles.

Create a new object, data_rct, from data_control and data_intervention by binding the two tibbles together using the appropriate {dplyr} bind_ function.

18.3 Extract parameters

18.3.1 t-test’s p-value

A Student’s t-test and extract its p value.
Use the data_for_ttest data set.
Return the p value as a column in a tibble.

18.3.2 Cohen’s d and its 95% Confidence Intervals

Calculate Cohen’s d using effectsize::cohens_d() and extract the Cohen’s d estimate and its 95% CIs.
Use the data_for_ttest data set.
Return the Cohen’s d estimate and its 95% CIs in tidy format tibble as the columns d_estimate, d_ci_lower, d_ci_upper

18.3.3 Extract parameters from a Pearson’s r correlation test

Fit a correlation test using cor.test() and extract the correlation estimate.

Use the data_for_correlation data set.
Return results in a tibble.

Extract the p-value from correlation test

Use the data_for_correlation data set.
Return results in a tibble.

Extract both the correlation and the p value

Use the data_for_correlation data set.
Return results in a tibble.

# Exercises for 'Required skills' chapter <span class="badge badge-draft3">✎ Polishing</span> ```{r} #| include: false # if it is available, run the setup script that tells quarto to round all df/tibble outputs to three decimal places if(file.exists("../_setup.R")){source("../_setup.R")} ``` These exercises accompany the [Required skills chapter](../chapters/1_required_skills.qmd). Complete the following exercises in your local copy of this .qmd file to check your data tidying skills. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page. Most of them involve extracting estimates from objects created by data simulation or statistical modelling functions. If you need a refresher, see the chapter on tidy data and reshaping in Ian's [other book](https://ianhussey.quarto.pub/reproducible-data-processing-and-visualization/chapters/reshaping_and_pivots.html). ## Data wrangling ```{r} #| include: false # simulate some data to work with # dependencies library(dplyr) library(forcats) library(faux) library(tibble) # set seed for reproduciblity set.seed(42) # data for t-test data_intervention <- tibble(condition = "intervention", score = rnorm(n = 50, mean = 0, sd = 1)) data_control <- tibble(condition = "control", score = rnorm(n = 50, mean = 0, sd = 1)) data_for_ttest <- bind_rows(data_intervention, data_control) |> # control's factor levels must be ordered so that intervention is the first level and control is the second # this ensures that positive Cohen's d values refer to intervention > control and not the other way around. mutate(condition = fct_relevel(condition, "intervention", "control")) # data for correlation data_for_correlation <- rnorm_multi(n = 100, vars = 2, mu = 0, sd = 1, r = 0.5, varnames = c("X", "Y")) ``` ### Calculate mean - Use `dplyr::summarize()`. - Use the `data_intervention` data set. - Return results in a tibble. ```{r} ``` ### Calculate *SD* - Use `dplyr::summarize()`. - Use the `data_intervention` data set. - Return results in a tibble. ```{r} ``` ### Calculate mean for each condition - Use `dplyr::summarize()`, `group_by()` and the pipe (`%>%` or `|>`). - Use the `data_for_ttest` data set. - Return results in a tibble. ```{r} ``` ### Calculate mean and *SD* for each condition - Use `dplyr::summarize()`, `group_by()` and the pipe (`%>%` or `|>`). - Use the `data_for_ttest` data set. - Return results in a tibble. ```{r} ``` ### Calculate mean and *SD* for each condition rounded to two decimal places - Use `dplyr::summarize()`, `group_by()` and the pipe (`%>%` or `|>`). - Use the `data_for_ttest` data set. - Return results in a tibble. - Round the means and SDs to two decimal places, using the round-half-up method, eg via `roundwork::round_up()`. Ideally, use `mutate_if()` or `across()` to round multiple columns. ```{r} ``` ## Generate data ### Normally distributed data in a tibble The `rnorm()` samples data from a normally distributed population with a given population mean ($\mu$) and population standard deviation ($\sigma$). ```{r} # set seed for reproducibility set.seed(42) rnorm(n = 10, mean = 0, sd = 1) ``` Make this tidier by returning this simulated values as the column `score` in a tibble. Assign the tibble to the object `data_control`. ```{r} ``` Create a second object, `data_intervention`, where the observations are sampled from a population mean ($\mu$) of 0.4. ```{r} ``` Create a new column called `condition` in each tibble using the appropriate {dplyr} function, setting it to "control" and "intervention" in the respective tibbles. ```{r} ``` Create a new object, `data_rct`, from `data_control` and `data_intervention` by binding the two tibbles together using the appropriate {dplyr} `bind_` function. ```{r} ``` ## Extract parameters ### *t*-test's *p*-value - A Student's *t*-test and extract its *p* value. - Use the `data_for_ttest` data set. - Return the *p* value as a column in a tibble. ```{r} ``` ### Cohen's *d* and its 95% Confidence Intervals - Calculate Cohen's *d* using `effectsize::cohens_d()` and extract the Cohen's *d* estimate and its 95% CIs. - Use the `data_for_ttest` data set. - Return the Cohen's d estimate and its 95% CIs in tidy format tibble as the columns `d_estimate`, `d_ci_lower`, `d_ci_upper` ```{r} ``` ### Extract parameters from a Pearson's *r* correlation test Fit a correlation test using `cor.test()` and extract the correlation estimate. - Use the `data_for_correlation` data set. - Return results in a tibble. ```{r} ``` Extract the *p*-value from correlation test - Use the `data_for_correlation` data set. - Return results in a tibble. ```{r} ``` Extract both the correlation and the *p* value - Use the `data_for_correlation` data set. - Return results in a tibble. ```{r} ```