# Exercises for 'Analysis functions' chapter <span class="badge badge-draft3">✎ Polishing</span>
```{r}
#| include: false
# if it is available, run the setup script that tells quarto to round all df/tibble outputs to three decimal places
if(file.exists("../_setup.R")){source("../_setup.R")}
```
These exercises accompany the [analysis functions chapter](../chapters/6_analysis_functions.qmd).
You can complete these exercises in your local version of the .qmd file. Either download a copy of the whole book from github (see introduction), or download this .qmd using the download button on the top right of the page.
Write functions for each of the following. Remember that you can write pseudocode first if it helps.
```{r}
#| include: false
# dependencies
library(tibble)
library(dplyr)
library(tidyr)
library(parameters)
library(janitor)
library(effectsize)
library(afex)
library(faux)
library(forcats)
# create datasets to be used
set.seed(43)
data_crosssectional <- faux::rnorm_multi(
n = 600,
vars = 2,
varnames = c("x", "y"),
mu = c(0, 0.5),
sd = 1,
r = 0.35
) %>%
as_tibble() # faux returns a data.frame; we convert to tibble for consistency
dat_experiment <-
bind_rows(
tibble(condition = rep("intervention", 40),
score = rnorm(n = 40, mean = 0.45, sd = 1)),
tibble(condition = rep("control", 40),
score = rnorm(n = 40, mean = 0, sd = 1))
) %>%
mutate(condition = fct_relevel(condition, "intervention", "control"),
condition_numeric = case_when(condition == "intervention" ~ 1,
condition == "control" ~ 0))
data_pre_post <- faux::rnorm_multi(
n = 70,
vars = 2,
varnames = c("score_pre", "score_post"),
mu = c(0, 0.5),
sd = 1,
r = 0.7
) %>%
as_tibble() # faux returns a data.frame; we convert to tibble for consistency
# mixed within-between RCT: treatment vs control at baseline vs post intervention
n_per_group <- 30
r_within <- 0.6 # pre-post correlation
dat_mixed_within_between <- rbind(
# Control group: slight reduction in scores from pre to post due to regression to the mean
rnorm_multi(
n = n_per_group,
vars = 2,
mu = c(pre = 28, post = 25),
sd = c(9, 9),
r = r_within
) %>%
mutate(condition = "control",
id = row_number()),
# Treatment group: improvement at post
rnorm_multi(
n = n_per_group,
vars = 2,
mu = c(pre = 5, post = 6),
sd = c(1, 1),
r = r_within
) %>%
mutate(condition = "treatment",
id = row_number() + n_per_group)
) %>%
# Pivot to long format for analysis
pivot_longer(cols = c(pre, post),
names_to = "timepoint",
values_to = "score") %>%
mutate(
id = factor(id),
condition = factor(condition),
time = factor(timepoint, levels = c("pre", "post")),
score = round(score, 0) # round score to make it more like a sum-score of the BDI-II
)
```
## Estimation and tests of means
### Independent *t*-test
Using the `dat_experiment` tibble, write a function to extract the t, df, p, and mean_diff from a students t test. The inputs are only the tibble containing the data; it can assume score ~ group
```{r}
```
#### Write more flexible functions
Instead of the assuming forcing the use of a Student's *t*-test via `var.equal = TRUE`, make this a) an option the user can specify when calling the function and b) make TRUE the default value. Can't remember how to do this? Go back to the Writing Functions chapter.
```{r}
```
Add arguments to specify `t.test()`'s `alternative` argument, i.e., whether the hypothesis test is two-sided or directional
```{r}
```
Add arguments to specify `t.test()`'s `mu` argument, i.e., whether the population mean difference being tested by the p value is zero (default) or some other value. We will use this in later chapters.
```{r}
```
Use the curly-curly operator (`{ column }`) to specify the `t.test()`'s IV and DV in its formula.
```{r}
```
#### Extracting Cohen's *d* effect sizes
Use the {effectsize} package where possible over alternatives, as its written by the same group who write {parameters} and plays nice with it and tidyverse.
Use `effectsize::cohens_d()` to extract Cohen's *d* and its 95% CIs and return these as a tibble. In between groups designs, this version of Cohen's d is referred to as Cohen's $d_s$.
```{r}
```
Write a function that extracts the t, df, p, mean_diff and its CIs, Cohen's *d* and its CIs, using {parameters} and {effectsize}
```{r}
```
### Dependent *t*-test and Cohen's $d_{rm}$
Use the `data_pre_post` tibble.
Use `t.test()` with `paired = TRUE` and `effectsize::repeated_measures_d()`.
```{r}
```
### Non-parameteric alternative measures
Use the `dat_experiment` tibble.
Non-parametric alternative to `t.test()` to test differences in ranks rather than differences in means using `wilcox.test()`
```{r}
```
### (RM-)ANOVA F-tests
Estimating and testing differences in means in within, between, or mixed within-between factorial designs with more than two cells using (RM-)ANOVA.
Use the `dat_mixed_within_between` tibble.
`afex::aov_ex()` is the gold standard for fitting (repeated measures) (RM-)ANOVAs in R using type III sum of squares, while avoiding the weird things that can happen with base-R's aov().
```{r}
#| include: false
# generate mixed within-between RCT data
set.seed(42)
n_subjects <- 40
dat_depression_rct_mixed <- tibble(
id = factor(1:n_subjects),
condition = rep(c("treatment", "control"), each = n_subjects/2)
) %>%
group_by(id, condition) %>%
reframe(time = factor(c("baseline", "post"))) %>%
mutate(
score = rnorm(n(), mean = 10, sd = 1) +
(time == "post") * 2 +
(condition == "treatment" & time == "post") * 3
)
```
```{r}
# fit
fit_rm_anova <- afex::aov_ez(
id = "id",
dv = "score",
data = dat_depression_rct_mixed,
between = "condition",
within = "time"
)
# extract parameters
## p values and df
parameters::model_parameters(fit_rm_anova) %>%
as_tibble() %>%
janitor::clean_names() # use snake_case
## partial Eta-Squared (more common but flawed metric)
effectsize::eta_squared(fit_rm_anova, partial = TRUE) %>%
as_tibble() %>%
janitor::clean_names() # use snake_case
## generalized Eta-Squared (recommended; ideally both)
effectsize::eta_squared(fit_rm_anova, generalized = TRUE) %>%
as_tibble() %>%
janitor::clean_names() # use snake_case
```
Write a function to provide tidy results of an RM ANOVA for a 2X2 pre/post intervention/control design, its f values, p values, dfs, partial-Eta-squared and generalized-Eta-squared effect sizes.
```{r}
```
## Estimation and tests of correlation/covariance
### Bivariate correlations
Use the `data_crosssectional` tibble.
Estimating and testing correlations between x and y with `cor.test()`
```{r}
```
### Regressions
Use the `data_crosssectional` tibble.
Estimating and testing regression slopes with `lm()`. Predict y based on x.
```{r}
```
## Estimation and tests of other parameters (aka assumption tests)
### Testing differences in variances
With Levene's test `rstatix::levene_test()`
Use the `dat_experiment` tibble.
```{r}
```
### Testing normality
With Shapiro-Wilk's test `shapiro.test()`. See also Anderson Darling test `nortest::ad.test()`.
Use the `dat_experiment` tibble.
```{r}
```
## Pre-analysis functions [TODO]
### Data transformation
Log transform a variable with `log()`.
Use the `dat_experiment` tibble.
```{r}
```
### Exclusions
Exclude outliers that are more than three standard deviations from the mean.
Use the `dat_experiment` tibble, even though this is unlikely to contain outliers.
```{r}
```