21 Exercises for ‘Analysis functions’ chapter ✎ Polishing

These exercises accompany the analysis functions chapter.

You can complete these exercises in your local version of the .qmd file. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page.

Write functions for each of the following. Remember that you can write pseudocode first if it helps.

21.1 Estimation and tests of means

21.1.1 Independent t-test

Using the dat_experiment tibble, write a function to extract the t, df, p, and mean_diff from a students t test. The inputs are only the tibble containing the data; it can assume score ~ group

21.1.1.1 Write more flexible functions

Instead of the assuming forcing the use of a Student’s t-test via var.equal = TRUE, make this a) an option the user can specify when calling the function and b) make TRUE the default value. Can’t remember how to do this? Go back to the Writing Functions chapter.

Add arguments to specify t.test()’s alternative argument, i.e., whether the hypothesis test is two-sided or directional

Add arguments to specify t.test()’s mu argument, i.e., whether the population mean difference being tested by the p value is zero (default) or some other value. We will use this in later chapters.

Use the curly-curly operator ({ column }) to specify the t.test()’s IV and DV in its formula.

21.1.1.2 Extracting Cohen’s d effect sizes

Use the {effectsize} package where possible over alternatives, as its written by the same group who write {parameters} and plays nice with it and tidyverse.

Use effectsize::cohens_d() to extract Cohen’s d and its 95% CIs and return these as a tibble. In between groups designs, this version of Cohen’s d is referred to as Cohen’s $d_s$.

Write a function that extracts the t, df, p, mean_diff and its CIs, Cohen’s d and its CIs, using {parameters} and {effectsize}

21.1.2 Dependent t-test and Cohen’s $d_{rm}$

Use the data_pre_post tibble.

Use t.test() with paired = TRUE and effectsize::repeated_measures_d().

21.1.3 Non-parameteric alternative measures

Use the dat_experiment tibble.

Non-parametric alternative to t.test() to test differences in ranks rather than differences in means using wilcox.test()

21.1.4 (RM-)ANOVA F-tests

Estimating and testing differences in means in within, between, or mixed within-between factorial designs with more than two cells using (RM-)ANOVA.

Use the dat_mixed_within_between tibble.

afex::aov_ex() is the gold standard for fitting (repeated measures) (RM-)ANOVAs in R using type III sum of squares, while avoiding the weird things that can happen with base-R’s aov().

# fit 
fit_rm_anova <- afex::aov_ez(
  id = "id", 
  dv = "score", 
  data = dat_depression_rct_mixed,
  between = "condition", 
  within = "time"
)

# extract parameters
## p values and df
parameters::model_parameters(fit_rm_anova) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case

parameter	sum_squares	sum_squares_error	df	df_error	mean_square	f	p	method
condition	38.12	43.04	1	38	1.13	33.66	1.10e-06	ANOVA estimation for factorial designs using ‘afex’
time	244.81	44.89	1	38	1.18	207.23	0.00e+00	ANOVA estimation for factorial designs using ‘afex’
condition:time	24.55	44.89	1	38	1.18	20.79	5.21e-05	ANOVA estimation for factorial designs using ‘afex’

## partial Eta-Squared (more common but flawed metric)
effectsize::eta_squared(fit_rm_anova, partial = TRUE) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case

parameter	eta2_partial	ci	ci_low	ci_high
condition	0.47	0.95	0.28	1
time	0.85	0.95	0.77	1
condition:time	0.35	0.95	0.16	1

## generalized Eta-Squared (recommended; ideally both)
effectsize::eta_squared(fit_rm_anova, generalized = TRUE) %>%
  as_tibble() %>%
  janitor::clean_names() # use snake_case

parameter	eta2_generalized	ci	ci_low	ci_high
condition	0.30	0.95	0.11	1
time	0.74	0.95	0.61	1
condition:time	0.22	0.95	0.05	1

Write a function to provide tidy results of an RM ANOVA for a 2X2 pre/post intervention/control design, its f values, p values, dfs, partial-Eta-squared and generalized-Eta-squared effect sizes.

21.2 Estimation and tests of correlation/covariance

21.2.1 Bivariate correlations

Use the data_crosssectional tibble.

Estimating and testing correlations between x and y with cor.test()

21.2.2 Regressions

Use the data_crosssectional tibble.

Estimating and testing regression slopes with lm(). Predict y based on x.

21.3 Estimation and tests of other parameters (aka assumption tests)

21.3.1 Testing differences in variances

With Levene’s test rstatix::levene_test()

Use the dat_experiment tibble.

21.3.2 Testing normality

With Shapiro-Wilk’s test shapiro.test(). See also Anderson Darling test nortest::ad.test().

Use the dat_experiment tibble.

21.4 Pre-analysis functions [TODO]

21.4.1 Data transformation

Log transform a variable with log().

Use the dat_experiment tibble.

21.4.2 Exclusions

Exclude outliers that are more than three standard deviations from the mean.

Use the dat_experiment tibble, even though this is unlikely to contain outliers.

# Exercises for 'Analysis functions' chapter <span class="badge badge-draft3">✎ Polishing</span> ```{r} #| include: false # if it is available, run the setup script that tells quarto to round all df/tibble outputs to three decimal places if(file.exists("../_setup.R")){source("../_setup.R")} ``` These exercises accompany the [analysis functions chapter](../chapters/6_analysis_functions.qmd). You can complete these exercises in your local version of the .qmd file. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page. Write functions for each of the following. Remember that you can write pseudocode first if it helps. ```{r} #| include: false # dependencies library(tibble) library(dplyr) library(tidyr) library(parameters) library(janitor) library(effectsize) library(afex) library(faux) library(forcats) # create datasets to be used set.seed(43) data_crosssectional <- faux::rnorm_multi( n = 600, vars = 2, varnames = c("x", "y"), mu = c(0, 0.5), sd = 1, r = 0.35 ) %>% as_tibble() # faux returns a data.frame; we convert to tibble for consistency dat_experiment <- bind_rows( tibble(condition = rep("intervention", 40), score = rnorm(n = 40, mean = 0.45, sd = 1)), tibble(condition = rep("control", 40), score = rnorm(n = 40, mean = 0, sd = 1)) ) %>% mutate(condition = fct_relevel(condition, "intervention", "control"), condition_numeric = case_when(condition == "intervention" ~ 1, condition == "control" ~ 0)) data_pre_post <- faux::rnorm_multi( n = 70, vars = 2, varnames = c("score_pre", "score_post"), mu = c(0, 0.5), sd = 1, r = 0.7 ) %>% as_tibble() # faux returns a data.frame; we convert to tibble for consistency # mixed within-between RCT: treatment vs control at baseline vs post intervention n_per_group <- 30 r_within <- 0.6 # pre-post correlation dat_mixed_within_between <- rbind( # Control group: slight reduction in scores from pre to post due to regression to the mean rnorm_multi( n = n_per_group, vars = 2, mu = c(pre = 28, post = 25), sd = c(9, 9), r = r_within ) %>% mutate(condition = "control", id = row_number()), # Treatment group: improvement at post rnorm_multi( n = n_per_group, vars = 2, mu = c(pre = 5, post = 6), sd = c(1, 1), r = r_within ) %>% mutate(condition = "treatment", id = row_number() + n_per_group) ) %>% # Pivot to long format for analysis pivot_longer(cols = c(pre, post), names_to = "timepoint", values_to = "score") %>% mutate( id = factor(id), condition = factor(condition), time = factor(timepoint, levels = c("pre", "post")), score = round(score, 0) # round score to make it more like a sum-score of the BDI-II ) ``` ## Estimation and tests of means ### Independent *t*-test Using the `dat_experiment` tibble, write a function to extract the t, df, p, and mean_diff from a students t test. The inputs are only the tibble containing the data; it can assume score ~ group ```{r} ``` #### Write more flexible functions Instead of the assuming forcing the use of a Student's *t*-test via `var.equal = TRUE`, make this a) an option the user can specify when calling the function and b) make TRUE the default value. Can't remember how to do this? Go back to the Writing Functions chapter. ```{r} ``` Add arguments to specify `t.test()`'s `alternative` argument, i.e., whether the hypothesis test is two-sided or directional ```{r} ``` Add arguments to specify `t.test()`'s `mu` argument, i.e., whether the population mean difference being tested by the p value is zero (default) or some other value. We will use this in later chapters. ```{r} ``` Use the curly-curly operator (`{ column }`) to specify the `t.test()`'s IV and DV in its formula. ```{r} ``` #### Extracting Cohen's *d* effect sizes Use the {effectsize} package where possible over alternatives, as its written by the same group who write {parameters} and plays nice with it and tidyverse. Use `effectsize::cohens_d()` to extract Cohen's *d* and its 95% CIs and return these as a tibble. In between groups designs, this version of Cohen's d is referred to as Cohen's $d_s$. ```{r} ``` Write a function that extracts the t, df, p, mean_diff and its CIs, Cohen's *d* and its CIs, using {parameters} and {effectsize} ```{r} ``` ### Dependent *t*-test and Cohen's $d_{rm}$ Use the `data_pre_post` tibble. Use `t.test()` with `paired = TRUE` and `effectsize::repeated_measures_d()`. ```{r} ``` ### Non-parameteric alternative measures Use the `dat_experiment` tibble. Non-parametric alternative to `t.test()` to test differences in ranks rather than differences in means using `wilcox.test()` ```{r} ``` ### (RM-)ANOVA F-tests Estimating and testing differences in means in within, between, or mixed within-between factorial designs with more than two cells using (RM-)ANOVA. Use the `dat_mixed_within_between` tibble. `afex::aov_ex()` is the gold standard for fitting (repeated measures) (RM-)ANOVAs in R using type III sum of squares, while avoiding the weird things that can happen with base-R's aov(). ```{r} #| include: false # generate mixed within-between RCT data set.seed(42) n_subjects <- 40 dat_depression_rct_mixed <- tibble( id = factor(1:n_subjects), condition = rep(c("treatment", "control"), each = n_subjects/2) ) %>% group_by(id, condition) %>% reframe(time = factor(c("baseline", "post"))) %>% mutate( score = rnorm(n(), mean = 10, sd = 1) + (time == "post") * 2 + (condition == "treatment" & time == "post") * 3 ) ``` ```{r} # fit fit_rm_anova <- afex::aov_ez( id = "id", dv = "score", data = dat_depression_rct_mixed, between = "condition", within = "time" ) # extract parameters ## p values and df parameters::model_parameters(fit_rm_anova) %>% as_tibble() %>% janitor::clean_names() # use snake_case ## partial Eta-Squared (more common but flawed metric) effectsize::eta_squared(fit_rm_anova, partial = TRUE) %>% as_tibble() %>% janitor::clean_names() # use snake_case ## generalized Eta-Squared (recommended; ideally both) effectsize::eta_squared(fit_rm_anova, generalized = TRUE) %>% as_tibble() %>% janitor::clean_names() # use snake_case ``` Write a function to provide tidy results of an RM ANOVA for a 2X2 pre/post intervention/control design, its f values, p values, dfs, partial-Eta-squared and generalized-Eta-squared effect sizes. ```{r} ``` ## Estimation and tests of correlation/covariance ### Bivariate correlations Use the `data_crosssectional` tibble. Estimating and testing correlations between x and y with `cor.test()` ```{r} ``` ### Regressions Use the `data_crosssectional` tibble. Estimating and testing regression slopes with `lm()`. Predict y based on x. ```{r} ``` ## Estimation and tests of other parameters (aka assumption tests) ### Testing differences in variances With Levene's test `rstatix::levene_test()` Use the `dat_experiment` tibble. ```{r} ``` ### Testing normality With Shapiro-Wilk's test `shapiro.test()`. See also Anderson Darling test `nortest::ad.test()`. Use the `dat_experiment` tibble. ```{r} ``` ## Pre-analysis functions [TODO] ### Data transformation Log transform a variable with `log()`. Use the `dat_experiment` tibble. ```{r} ``` ### Exclusions Exclude outliers that are more than three standard deviations from the mean. Use the `dat_experiment` tibble, even though this is unlikely to contain outliers. ```{r} ```

21.1 Estimation and tests of means

21.1.1 Independent t-test

21.1.1.1 Write more flexible functions

21.1.1.2 Extracting Cohen’s d effect sizes

21.1.2 Dependent t-test and Cohen’s \(d_{rm}\)

21.1.3 Non-parameteric alternative measures

21.1.4 (RM-)ANOVA F-tests

21.2 Estimation and tests of correlation/covariance

21.2.1 Bivariate correlations

21.2.2 Regressions

21.3 Estimation and tests of other parameters (aka assumption tests)

21.3.1 Testing differences in variances

21.3.2 Testing normality

21.4 Pre-analysis functions [TODO]

21.4.1 Data transformation

21.4.2 Exclusions