24  Exercises for ‘Analysis functions’ chapter ✎ Polishing

These exercises accompany the analysis functions chapter.

You can complete these exercises in your local version of the .qmd file. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page.

Write functions for each of the following. Remember that you can write pseudocode first if it helps.

24.1 Estimation and tests of means

24.1.1 Independent t-test

Using the dat_experiment tibble, write a function to extract the t, df, p, and mean_diff from a students t test. The inputs are only the tibble containing the data; it can assume score ~ group

24.1.1.1 Write more flexible functions

Instead of the assuming forcing the use of a Student’s t-test via var.equal = TRUE, make this a) an option the user can specify when calling the function and b) make TRUE the default value. Can’t remember how to do this? Go back to the Writing Functions chapter.

Add arguments to specify t.test()’s alternative argument, i.e., whether the hypothesis test is two-sided or directional

Add arguments to specify t.test()’s mu argument, i.e., whether the population mean difference being tested by the p value is zero (default) or some other value. We will use this in later chapters.

Use the curly-curly operator ({ column }) to specify the t.test()’s IV and DV in its formula.

24.1.1.2 Extracting Cohen’s d effect sizes

Use the {effectsize} package where possible over alternatives, as its written by the same group who write {parameters} and plays nice with it and tidyverse.

Use effectsize::cohens_d() to extract Cohen’s d and its 95% CIs and return these as a tibble. In between groups designs, this version of Cohen’s d is referred to as Cohen’s \(d_s\).

Write a function that extracts the t, df, p, mean_diff and its CIs, Cohen’s d and its CIs, using {parameters} and {effectsize}

24.1.2 Dependent t-test and Cohen’s \(d_{rm}\)

Use the data_pre_post tibble.

Use t.test() with paired = TRUE and effectsize::repeated_measures_d().

24.1.3 Non-parameteric alternative measures

Use the dat_experiment tibble.

Non-parametric alternative to t.test() to test differences in ranks rather than differences in means using wilcox.test()

24.1.4 (RM-)ANOVA F-tests

Estimating and testing differences in means in within, between, or mixed within-between factorial designs with more than two cells using (RM-)ANOVA.

Use the dat_mixed_within_between tibble.

afex::aov_ex() is the gold standard for fitting (repeated measures) (RM-)ANOVAs in R using type III sum of squares, while avoiding the weird things that can happen with base-R’s aov().

# fit 
fit_rm_anova <- afex::aov_ez(
  id = "id", 
  dv = "score", 
  data = dat_depression_rct_mixed,
  between = "condition", 
  within = "time"
)

# extract parameters
## p values and df
parameters::model_parameters(fit_rm_anova) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case
parameter sum_squares sum_squares_error df df_error mean_square f p method
condition 38.12 43.04 1 38 1.13 33.66 1.10e-06 ANOVA estimation for factorial designs using ‘afex’
time 244.81 44.89 1 38 1.18 207.23 0.00e+00 ANOVA estimation for factorial designs using ‘afex’
condition:time 24.55 44.89 1 38 1.18 20.79 5.21e-05 ANOVA estimation for factorial designs using ‘afex’
## partial Eta-Squared (more common but flawed metric)
effectsize::eta_squared(fit_rm_anova, partial = TRUE) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case
parameter eta2_partial ci ci_low ci_high
condition 0.47 0.95 0.28 1
time 0.85 0.95 0.77 1
condition:time 0.35 0.95 0.16 1
## generalized Eta-Squared (recommended; ideally both)
effectsize::eta_squared(fit_rm_anova, generalized = TRUE) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case
parameter eta2_generalized ci ci_low ci_high
condition 0.30 0.95 0.11 1
time 0.74 0.95 0.61 1
condition:time 0.22 0.95 0.05 1

Write a function to provide tidy results of an RM ANOVA for a 2X2 pre/post intervention/control design, its f values, p values, dfs, partial-Eta-squared and generalized-Eta-squared effect sizes.

24.2 Estimation and tests of correlation/covariance

24.2.1 Bivariate correlations

Use the data_crosssectional tibble.

Estimating and testing correlations between x and y with cor.test()

24.2.2 Regressions

Use the data_crosssectional tibble.

Estimating and testing regression slopes with lm(). Predict y based on x.

24.3 Estimation and tests of other parameters (aka assumption tests)

24.3.1 Testing differences in variances

With Levene’s test rstatix::levene_test()

Use the dat_experiment tibble.

24.3.2 Testing normality

With Shapiro-Wilk’s test shapiro.test(). See also Anderson Darling test nortest::ad.test().

Use the dat_experiment tibble.

24.4 Pre-analysis functions [TODO]

24.4.1 Data transformation

Log transform a variable with log().

Use the dat_experiment tibble.

24.4.2 Exclusions

Exclude outliers that are more than three standard deviations from the mean.

Use the dat_experiment tibble, even though this is unlikely to contain outliers.