22 Exercises for ‘Creating simulation experiments’ chapter ✎ Very rough draft

22.1 TO-DO:

22.1.1 Make explicit not to write functions

22.1.2 How to make an R chunk; add the chunks into the exercises

22.1.3 Make exercise 3 wording clearer

22.1.4 Make variables named

22.2 Exercises

# dependencies
library(dplyr)
library(tidyr)

22.2.1 Exercise 1: Basic parameter grid

Use expand_grid() to create a parameter grid for a simulation where:

n_per_condition is 20, 50, or 100
mean_intervention is 0.2 or 0.8
sd is 1
correlation_between_conditions is 0.3 or 0.5.

How many rows does the resulting grid have? Verify using nrow() and distinct().

22.2.2 Exercise 2: Filtering implausible combinations

Starting from the grid below, use filter() to remove any rows where n_per_condition is less than 50 and mean_intervention is less than 0.3. How many rows remain?

expand_grid(
  n_per_condition = c(20, 50, 100, 200),
  mean_intervention = c(0.1, 0.3, 0.5),
  sd = c(0.5, 1)
)

n_per_condition	mean_intervention	sd
20	0.1	0.5
20	0.1	1.0
20	0.3	0.5
20	0.3	1.0
20	0.5	0.5
20	0.5	1.0
50	0.1	0.5
50	0.1	1.0
50	0.3	0.5
50	0.3	1.0
50	0.5	0.5
50	0.5	1.0
100	0.1	0.5
100	0.1	1.0
100	0.3	0.5
100	0.3	1.0
100	0.5	0.5
100	0.5	1.0
200	0.1	0.5
200	0.1	1.0
200	0.3	0.5
200	0.3	1.0
200	0.5	0.5
200	0.5	1.0

22.2.3 Exercise 3: Dependent parameters with `mutate()`

You are simulating a reading comprehension study. Your parameters are:

n_passages: the number of passages participants read (5, 10, or 20)
passage_difficulty: “easy” or “hard”

The total time allowed (time_limit_minutes) depends on the other two parameters: participants get 2 minutes per easy passage and 4 minutes per hard passage. Use expand_grid() and mutate() to create the parameter grid with time_limit_minutes derived from the other columns.

example_data <- expand_grid(
  n_passages = c(5, 10, 20),
  passage_difficulty = c("easy", "hard")
) %>%
  mutate(time_limit_minutes = 
           case_when(
             passage_difficulty == "easy" ~ n_passages * 2,
             passage_difficulty == "hard" ~ n_passages * 4
             )
         )

22.2.4 Exercise 4: Choosing the right design

For each scenario below, determine whether you would use (a) a fully-crossed design, (b) a non-fully-crossed design with filter(), or (c) a design with dependencies using mutate().

You vary sample size (50, 100, 200) and effect size (0.2, 0.5, 0.8), and all combinations are of interest and kept.
You vary the number of predictors (2, 5, 10) and the number of observations (20, 50, 100, 500), but you want to exclude cases where the number of observations is smaller than 10 times the number of predictors.
You vary the number of items on a test (10, 20, 40) and want the total test time to always equal 2 minutes per item.
You are simulating a clinical trial and vary the dropout rate (5%, 15%, 30%) and treatment effect size (0.3, 0.5, 0.7). All combinations are realistic because patients drop out for many reasons unrelated to efficacy.
You are simulating ecological data on bird species counts across habitats. You vary habitat type (forest, wetland, urban) and survey area size (1 km², 5 km², 25 km²), but want to exclude large survey areas for urban habitats because contiguous urban green spaces larger than 5 km² are unrealistic in your study region.

22.2.5 Exercise 5: Build your own simulation design

Think of a research question from an area of research you find interesting. Define at least three parameters you would want to vary, create a parameter grid using expand_grid(), and include at least one dependency or filter. Use nrow() and distinct() to sanity-check your grid. Write a brief comment (2–3 sentences) explaining why you chose each parameter value and why you included the dependency or filter.

# Exercises for 'Creating simulation experiments' chapter <span class="badge badge-draft1">✎ Very rough draft</span> ## TO-DO: ### Make explicit not to write functions ### How to make an R chunk; add the chunks into the exercises ### Make exercise 3 wording clearer ### Make variables named ```{r} #| include: false # if it is available, run the setup script that tells quarto to round all df/tibble outputs to three decimal places if(file.exists("../_setup.R")){source("../_setup.R")} ``` ## Exercises ```{r} # dependencies library(dplyr) library(tidyr) ``` ### Exercise 1: Basic parameter grid Use `expand_grid()` to create a parameter grid for a simulation where: - `n_per_condition` is 20, 50, or 100 - `mean_intervention` is 0.2 or 0.8 - `sd` is 1 - `correlation_between_conditions` is 0.3 or 0.5. How many rows does the resulting grid have? Verify using `nrow()` and `distinct()`. ### Exercise 2: Filtering implausible combinations Starting from the grid below, use `filter()` to remove any rows where `n_per_condition` is less than 50 **and** `mean_intervention` is less than 0.3. How many rows remain? ```{r} expand_grid( n_per_condition = c(20, 50, 100, 200), mean_intervention = c(0.1, 0.3, 0.5), sd = c(0.5, 1) ) ``` ### Exercise 3: Dependent parameters with `mutate()` You are simulating a reading comprehension study. Your parameters are: - `n_passages`: the number of passages participants read (5, 10, or 20) - `passage_difficulty`: "easy" or "hard" The total time allowed (`time_limit_minutes`) depends on the other two parameters: participants get 2 minutes per easy passage and 4 minutes per hard passage. Use `expand_grid()` and `mutate()` to create the parameter grid with `time_limit_minutes` derived from the other columns. ```{r} example_data <- expand_grid( n_passages = c(5, 10, 20), passage_difficulty = c("easy", "hard") ) %>% mutate(time_limit_minutes = case_when( passage_difficulty == "easy" ~ n_passages * 2, passage_difficulty == "hard" ~ n_passages * 4 ) ) ``` ### Exercise 4: Choosing the right design For each scenario below, determine whether you would use (a) a fully-crossed design, (b) a non-fully-crossed design with `filter()`, or (c) a design with dependencies using `mutate()`. 1. You vary sample size (50, 100, 200) and effect size (0.2, 0.5, 0.8), and all combinations are of interest and kept. 2. You vary the number of predictors (2, 5, 10) and the number of observations (20, 50, 100, 500), but you want to exclude cases where the number of observations is smaller than 10 times the number of predictors. 3. You vary the number of items on a test (10, 20, 40) and want the total test time to always equal 2 minutes per item. 4. You are simulating a clinical trial and vary the dropout rate (5%, 15%, 30%) and treatment effect size (0.3, 0.5, 0.7). All combinations are realistic because patients drop out for many reasons unrelated to efficacy. 5. You are simulating ecological data on bird species counts across habitats. You vary habitat type (forest, wetland, urban) and survey area size (1 km², 5 km², 25 km²), but want to exclude large survey areas for urban habitats because contiguous urban green spaces larger than 5 km² are unrealistic in your study region. ### Exercise 5: Build your own simulation design Think of a research question from an area of research you find interesting. Define at least three parameters you would want to vary, create a parameter grid using `expand_grid()`, and include at least one dependency or filter. Use `nrow()` and `distinct()` to sanity-check your grid. Write a brief comment (2–3 sentences) explaining why you chose each parameter value and why you included the dependency or filter.