25  Exercises for ‘Creating simulation experiments’ chapter ✎ Very rough draft

25.1 TO-DO:

25.1.1 Make explicit not to write functions

25.1.2 How to make an R chunk; add the chunks into the exercises

25.1.3 Make exercise 3 wording clearer

25.1.4 Make variables named

25.2 Exercises

# dependencies
library(dplyr)
library(tidyr)

25.2.1 Exercise 1: Basic parameter grid

Use expand_grid() to create a parameter grid for a simulation where:

  • n_per_condition is 20, 50, or 100

  • mean_intervention is 0.2 or 0.8

  • sd is 1

  • correlation_between_conditions is 0.3 or 0.5.

How many rows does the resulting grid have? Verify using nrow() and distinct().

25.2.2 Exercise 2: Filtering implausible combinations

Starting from the grid below, use filter() to remove any rows where n_per_condition is less than 50 and mean_intervention is less than 0.3. How many rows remain?

expand_grid(
  n_per_condition = c(20, 50, 100, 200),
  mean_intervention = c(0.1, 0.3, 0.5),
  sd = c(0.5, 1)
)
n_per_condition mean_intervention sd
20 0.1 0.5
20 0.1 1.0
20 0.3 0.5
20 0.3 1.0
20 0.5 0.5
20 0.5 1.0
50 0.1 0.5
50 0.1 1.0
50 0.3 0.5
50 0.3 1.0
50 0.5 0.5
50 0.5 1.0
100 0.1 0.5
100 0.1 1.0
100 0.3 0.5
100 0.3 1.0
100 0.5 0.5
100 0.5 1.0
200 0.1 0.5
200 0.1 1.0
200 0.3 0.5
200 0.3 1.0
200 0.5 0.5
200 0.5 1.0

25.2.3 Exercise 3: Dependent parameters with mutate()

You are simulating a reading comprehension study. Your parameters are:

  • n_passages: the number of passages participants read (5, 10, or 20)

  • passage_difficulty: “easy” or “hard”

The total time allowed (time_limit_minutes) depends on the other two parameters: participants get 2 minutes per easy passage and 4 minutes per hard passage. Use expand_grid() and mutate() to create the parameter grid with time_limit_minutes derived from the other columns.

example_data <- expand_grid(
  n_passages = c(5, 10, 20),
  passage_difficulty = c("easy", "hard")
) %>%
  mutate(time_limit_minutes = 
           case_when(
             passage_difficulty == "easy" ~ n_passages * 2,
             passage_difficulty == "hard" ~ n_passages * 4
             )
         )

25.2.4 Exercise 4: Choosing the right design

For each scenario below, determine whether you would use (a) a fully-crossed design, (b) a non-fully-crossed design with filter(), or (c) a design with dependencies using mutate().

  1. You vary sample size (50, 100, 200) and effect size (0.2, 0.5, 0.8), and all combinations are of interest and kept.

  2. You vary the number of predictors (2, 5, 10) and the number of observations (20, 50, 100, 500), but you want to exclude cases where the number of observations is smaller than 10 times the number of predictors.

  3. You vary the number of items on a test (10, 20, 40) and want the total test time to always equal 2 minutes per item.

  4. You are simulating a clinical trial and vary the dropout rate (5%, 15%, 30%) and treatment effect size (0.3, 0.5, 0.7). All combinations are realistic because patients drop out for many reasons unrelated to efficacy.

  5. You are simulating ecological data on bird species counts across habitats. You vary habitat type (forest, wetland, urban) and survey area size (1 km², 5 km², 25 km²), but want to exclude large survey areas for urban habitats because contiguous urban green spaces larger than 5 km² are unrealistic in your study region.

25.2.5 Exercise 5: Build your own simulation design

Think of a research question from an area of research you find interesting. Define at least three parameters you would want to vary, create a parameter grid using expand_grid(), and include at least one dependency or filter. Use nrow() and distinct() to sanity-check your grid. Write a brief comment (2–3 sentences) explaining why you chose each parameter value and why you included the dependency or filter.