21  Exercises for ‘Required skills’ chapter ✎ Polishing

These exercises accompany the Required skills chapter.

Complete the following exercises in your local copy of this .qmd file to check your data tidying skills. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page. Most of them involve extracting estimates from objects created by data simulation or statistical modelling functions.

If you need a refresher, see the chapter on tidy data and reshaping in Ian’s other book.

21.1 Data wrangling

21.1.1 Calculate mean

  • Use dplyr::summarize().
  • Use the data_intervention data set.
  • Return results in a tibble.

21.1.2 Calculate SD

  • Use dplyr::summarize().
  • Use the data_intervention data set.
  • Return results in a tibble.

21.1.3 Calculate mean for each condition

  • Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
  • Use the data_for_ttest data set.
  • Return results in a tibble.

21.1.4 Calculate mean and SD for each condition

  • Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
  • Use the data_for_ttest data set.
  • Return results in a tibble.

21.1.5 Calculate mean and SD for each condition rounded to two decimal places

  • Use dplyr::summarize(), group_by() and the pipe (%>% or |>).
  • Use the data_for_ttest data set.
  • Return results in a tibble.
  • Round the means and SDs to two decimal places, using the round-half-up method, eg via roundwork::round_up(). Ideally, use mutate_if() or across() to round multiple columns.

21.2 Generate data

21.2.1 Normally distributed data in a tibble

The rnorm() samples data from a normally distributed population with a given population mean (\(\mu\)) and population standard deviation (\(\sigma\)).

# set seed for reproducibility
set.seed(42)

rnorm(n = 10, 
      mean = 0, 
      sd = 1)
 [1]  1.37095845 -0.56469817  0.36312841  0.63286260  0.40426832 -0.10612452
 [7]  1.51152200 -0.09465904  2.01842371 -0.06271410

Make this tidier by returning this simulated values as the column score in a tibble. Assign the tibble to the object data_control.

Create a second object, data_intervention, where the observations are sampled from a population mean (\(\mu\)) of 0.4.

Create a new column called condition in each tibble using the appropriate {dplyr} function, setting it to “control” and “intervention” in the respective tibbles.

Create a new object, data_rct, from data_control and data_intervention by binding the two tibbles together using the appropriate {dplyr} bind_ function.

21.3 Extract parameters

21.3.1 t-test’s p-value

  • A Student’s t-test and extract its p value.
  • Use the data_for_ttest data set.
  • Return the p value as a column in a tibble.

21.3.2 Cohen’s d and its 95% Confidence Intervals

  • Calculate Cohen’s d using effectsize::cohens_d() and extract the Cohen’s d estimate and its 95% CIs.
  • Use the data_for_ttest data set.
  • Return the Cohen’s d estimate and its 95% CIs in tidy format tibble as the columns d_estimate, d_ci_lower, d_ci_upper

21.3.3 Extract parameters from a Pearson’s r correlation test

Fit a correlation test using cor.test() and extract the correlation estimate.

  • Use the data_for_correlation data set.
  • Return results in a tibble.

Extract the p-value from correlation test

  • Use the data_for_correlation data set.
  • Return results in a tibble.

Extract both the correlation and the p value

  • Use the data_for_correlation data set.
  • Return results in a tibble.