7  Creating simulation experiments ✎ Rough draft

Recall the five steps of Monte Carlo simulation studies:

  1. Make an experiment
  2. Generate data
  3. Analyse data
  4. Repeat this many times
  5. Summarise results

In our previous chapters, we skipped over Step 1 and went straight to Steps 2 and 3. Now, we’re going all the way back to Step 1: making the experiment in the first place.

In our earlier chapter, you saw examples of generating data, and specifying different parameter values in doing so. This is at the core of simulation: generating data with different parameter values and experimenting with how results of simulations change across these different values. But how do we actually create these many types of parameter values? Or, in other words: how do we actually design simulation experiments?

7.1 Varying experiment parameters manually

The way that we ideally design simulation experiments is through the use of a parameter grid. This is, basically, a dataframe or tibble that specifies all of the variations of our simulations that we would want to run. For example, imagine we want to use simulation to analyse the difference in means between two groups (and get the p-value). We can write generate_data() and analyse_data() functions for this firstly, like we have done in previous lessons. Specifically:

# dependencies
library(tidyr)
library(dplyr)

# our data generation function
generate_data <- function(n_per_condition,
                          mean_control,
                          mean_intervention,
                          sd) {

  data_control <-
    tibble(condition = "control",
           score = rnorm(n = n_per_condition, mean = mean_control, sd = sd))

  data_intervention <-
    tibble(condition = "intervention",
           score = rnorm(n = n_per_condition, mean = mean_intervention, sd = sd))

  data_combined <- bind_rows(data_control, data_intervention)

  return(data_combined)
}

# our analysis function
analyse_data <- function(data) {

  # run a t-test
  res_t_test <- t.test(formula = score ~ condition,
                       data = data,
                       var.equal = TRUE,
                       alternative = "two.sided")

  # get the p-value, put it in a tibble
  res <- tibble(p = res_t_test$p.value)

  return(res)
}

Let’s suppose we want to generate data for when n_per_condition is either 30 or 50, and where mean_intervention is either 0.3 or 0.5 (in all cases, we want mean_control = 0 and sd = 1). We could do this by just typing out each simulation, like:

# n = 30, mean = .3
generate_data(
  n_per_condition = 30,
  mean_control = 0,
  mean_intervention = .3,
  sd = 1
) |>
  analyse_data()
p
0.3205063
# n = 50, mean = .3
generate_data(
  n_per_condition = 30,
  mean_control = 0,
  mean_intervention = .3,
  sd = 1
) |>
  analyse_data()
p
0.0007945
# n = 30, mean = .5
generate_data(
  n_per_condition = 30,
  mean_control = 0,
  mean_intervention = .3,
  sd = 1
) |>
  analyse_data()
p
0.2668199
# n = 50, mean = .5
generate_data(
  n_per_condition = 30,
  mean_control = 0,
  mean_intervention = .3,
  sd = 1
) |>
  analyse_data()
p
0.0305893

Clearly this works. But the problem is pretty obvious: if we want to repeat these processes lots of times, or look at lots of different parameter combinations, then this will involve a lot of copy-pasting! This is not ideal. We will cover how to more efficiently “repeat the process lots of times” (i.e., Step 4 of Monte Carlo simulation) in the next lesson. But for now, let’s focus on how to look at lots of different parameter combinations more efficiently.

7.2 Varying experiment parameters using parameter grids

Instead of copy-pasting each setting, we can instead create a parameter grid, which specifies each combination of parameters we’re interested in. For example, suppose we want to create a grid that specifies the parameters we used above (mean_intervention either 0.3 or 0.5, n_per_condition either 30 or 50, sd = 1, mean_control = 0). We could specify a parameter grid by repeating each level of each variable as a single row in a tibble, like this:

parameter_grid <- tibble(
  mean_control = rep(0, 4), # repeat this 4 times - once for each combination
  sd = 1, # repeat this 4 times - once for each combination
  mean_intervention = rep(c(0.3, 0.5), each = 2),
  n_per_condition = rep(c(30, 50), 2)
)

parameter_grid
mean_control sd mean_intervention n_per_condition
0 1 0.3 30
0 1 0.3 50
0 1 0.5 30
0 1 0.5 50

As you can see above, this worked: we now have a grid that specifies each combination of parameters we’re interested in! We could then run an analysis on each of these, doing something like:

# generate data based on first row of parameter_grid
generate_data(
  n_per_condition = parameter_grid$n_per_condition[1],
  mean_intervention = parameter_grid$mean_intervention[1],
  mean_control = parameter_grid$mean_control[1],
  sd = parameter_grid$sd[1]
) |>
  analyse_data()
p
0.2272094
# generate data based on second row of parameter_grid
generate_data(
  n_per_condition = parameter_grid$n_per_condition[2],
  mean_intervention = parameter_grid$mean_intervention[2],
  mean_control = parameter_grid$mean_control[2],
  sd = parameter_grid$sd[2]
) |>
  analyse_data()
p
0.3253083

And so on.

Again, we will cover how to do this mapping of parameter grids to simulations more efficiently in the next lesson. But for now, there is one other inefficiency we should handle: creating this parameter grid more smoothly.

7.3 Creating parameter grids more efficiently: expand_grid()

In the example above, we constructed our parameter grid like this:

parameter_grid <- tibble(
  mean_control = rep(0, 4), # repeat this 4 times - once for each combination
  sd = 1, # repeat this 4 times - once for each combination
  mean_intervention = rep(c(0.3, 0.5), each = 2),
  n_per_condition = rep(c(30, 50), 2)
)

However, this is also not an ideal approach. You can see we are using the rep() function, which repeats a value a number of times. This can be confusing, because we need to count the overall number of times we want each variable to be present, and we also want to have all combinations of parameters while not repeating anything. Especially if we ever want to move to more complex simulations (e.g., with many different parameter combinations) then it would become quite difficult to ensure that all combinations are accurately covered.

Instead, we can use the handy function expand_grid(). With this function, we give each desired value of each parameter once, and then expand_grid() maps this into a tibble with all combinations of all parameters.

For our example above, this looks like:

experiment_parameters <- expand_grid(
  n_per_condition = c(30, 50),
  mean_intervention = c(0.3, .5),
  mean_control = 0,
  sd = 1
)

expand_grid() greatly simplifies the process of specifying parameters for our simulations, and figures out all of the combinations for us.

7.4 Sanity checking your parameter grid

It can be good practice to sanity-check your parameter grid - specifically, to make sure that the number of rows in the grid is consistent with what you would expect. For example, in our example above, we would expect experiment_parameters to have 4 rows: one for each combination of mean_intervention and n_per_condition. Let’s double-check this by counting the number of rows:

experiment_parameters |> 
  nrow()
[1] 4

Success! But of course, we could have made a mistake: for example, maybe we did something wrong in our expand_grid() call that meant we haven’t got full coverage of the parameter space we wanted. We can check this in more depth by combining this with a call to the distinct() function, which will keep only those rows which have distinct combinations of variables:

experiment_parameters |> 
  distinct() |>
  nrow()
[1] 4

Looks good!

7.5 Types of Simulation Designs

There are different types of experiment designs that we might want to use in our simulation studies, usually depending on how many total combinations are possible, which of those combinations we actually find interesting, and how much time we are willing to let the simulation run for. Let’s start with the most simple design first: a fully-crossed design.

7.5.1 Fully crossed designs

A fully-crossed design is what expand_grid() gives you by default: every level of every parameter is combined with every level of every other parameter. This is especially relevant for small simulation designs (e.g., like our one above) or where we care about every level of the simulation.

Fully-crossed designs are also easy to interpret: we can look at every combination of every parameter, summarise over them, and look at the whole trend. For much of this course, we will be using fully-crossed designs most often for this reason.

Let’s suppose, for example, we want to expand our experiment above to also have two different standard deviation possibilities (0.5 or 1) and two different mean_control values (0 and 0.1). A fully-crossed version of this would be:

experiment_parameters <- expand_grid(
  n_per_condition = c(30, 50),
  mean_intervention = c(0.3, 0.5),
  sd = c(0.5, 1),
  mean_control = c(0, 0.1)
)

experiment_parameters |> nrow()
[1] 16

Fully-crossed designs are ideal when you want to understand how performance changes across multiple factors independently. This means that each parameter should plausibly be able to take each value regardless of the others (e.g., a mean_intervention of 0.5 should be possible under any of the conditions of the other parameters).

However, sometimes parameter combinations are not possible. For example, imagine we’re simulating human heights across different age groups. A mean height of 180 cm might be perfectly plausible for a dataset when the mean age is 18, but it would be implausible when the mean age is 12. Likewise, a mean height of 140cm might be possible among 12 year olds, but not adults.

If we simulated this as a fully-crossed design, this would look something like:

height_parameters <- expand_grid(
  mean_height = c(140, 160, 180),
  mean_age = c(12, 18, 24)
)

height_parameters
mean_height mean_age
140 12
140 18
140 24
160 12
160 18
160 24
180 12
180 18
180 24

Clearly, some of these combinations are unrealistic, and probably would end up just being a waste of time to compute in our simulation. This brings us to designs that do not consider all combinations of all values: non-fully-crossed designs.

7.5.2 Non-fully-crossed designs

In non-fully-crossed designs, we remove values that we do not think are possible, plausible, or that we simply don’t care about for whatever reason. This means that we would, for example, exclude the possibility of scarily-tall children in our height_parameters specification above.

The best way to achieve this is to start with a fully-crossed design, and then use filter() to remove combinations we think are implausible. For example:

height_parameters <- expand_grid(
  mean_height = c(140, 160, 180),
  mean_age = c(12, 18, 24)
) |>
  filter(!(mean_height %in% c(160, 180) & mean_age == 12),
         !(mean_height == 140 & mean_age > 12))

height_parameters
mean_height mean_age
140 12
160 18
160 24
180 18
180 24

Let’s go back to our previous example with experiment_parameters:

experiment_parameters <- expand_grid(
  n_per_condition = c(30, 50),
  mean_intervention = c(0.3, 0.5),
  sd = c(0.5, 1),
  mean_control = c(0, 0.1)
)

Suppose we want to make a more elaborate experiment, with n_per_condition ranging from 50 to 100 in steps of 10 participants, and with mean_intervention ranging from .1 to .5 in steps of .1 (we can do this using seq(), which creates a sequence of numbers).

experiment_parameters <- expand_grid(
  n_per_condition = seq(from = 50, to = 100, by = 10),
  mean_intervention = seq(from = 0.1, to = 0.5, by = 0.1),
  sd = c(0.5, 1),
  mean_control = c(0, 0.1)
)

experiment_parameters 
n_per_condition mean_intervention sd mean_control
50 0.1 0.5 0.0
50 0.1 0.5 0.1
50 0.1 1.0 0.0
50 0.1 1.0 0.1
50 0.2 0.5 0.0
50 0.2 0.5 0.1
50 0.2 1.0 0.0
50 0.2 1.0 0.1
50 0.3 0.5 0.0
50 0.3 0.5 0.1
50 0.3 1.0 0.0
50 0.3 1.0 0.1
50 0.4 0.5 0.0
50 0.4 0.5 0.1
50 0.4 1.0 0.0
50 0.4 1.0 0.1
50 0.5 0.5 0.0
50 0.5 0.5 0.1
50 0.5 1.0 0.0
50 0.5 1.0 0.1
60 0.1 0.5 0.0
60 0.1 0.5 0.1
60 0.1 1.0 0.0
60 0.1 1.0 0.1
60 0.2 0.5 0.0
60 0.2 0.5 0.1
60 0.2 1.0 0.0
60 0.2 1.0 0.1
60 0.3 0.5 0.0
60 0.3 0.5 0.1
60 0.3 1.0 0.0
60 0.3 1.0 0.1
60 0.4 0.5 0.0
60 0.4 0.5 0.1
60 0.4 1.0 0.0
60 0.4 1.0 0.1
60 0.5 0.5 0.0
60 0.5 0.5 0.1
60 0.5 1.0 0.0
60 0.5 1.0 0.1
70 0.1 0.5 0.0
70 0.1 0.5 0.1
70 0.1 1.0 0.0
70 0.1 1.0 0.1
70 0.2 0.5 0.0
70 0.2 0.5 0.1
70 0.2 1.0 0.0
70 0.2 1.0 0.1
70 0.3 0.5 0.0
70 0.3 0.5 0.1
70 0.3 1.0 0.0
70 0.3 1.0 0.1
70 0.4 0.5 0.0
70 0.4 0.5 0.1
70 0.4 1.0 0.0
70 0.4 1.0 0.1
70 0.5 0.5 0.0
70 0.5 0.5 0.1
70 0.5 1.0 0.0
70 0.5 1.0 0.1
80 0.1 0.5 0.0
80 0.1 0.5 0.1
80 0.1 1.0 0.0
80 0.1 1.0 0.1
80 0.2 0.5 0.0
80 0.2 0.5 0.1
80 0.2 1.0 0.0
80 0.2 1.0 0.1
80 0.3 0.5 0.0
80 0.3 0.5 0.1
80 0.3 1.0 0.0
80 0.3 1.0 0.1
80 0.4 0.5 0.0
80 0.4 0.5 0.1
80 0.4 1.0 0.0
80 0.4 1.0 0.1
80 0.5 0.5 0.0
80 0.5 0.5 0.1
80 0.5 1.0 0.0
80 0.5 1.0 0.1
90 0.1 0.5 0.0
90 0.1 0.5 0.1
90 0.1 1.0 0.0
90 0.1 1.0 0.1
90 0.2 0.5 0.0
90 0.2 0.5 0.1
90 0.2 1.0 0.0
90 0.2 1.0 0.1
90 0.3 0.5 0.0
90 0.3 0.5 0.1
90 0.3 1.0 0.0
90 0.3 1.0 0.1
90 0.4 0.5 0.0
90 0.4 0.5 0.1
90 0.4 1.0 0.0
90 0.4 1.0 0.1
90 0.5 0.5 0.0
90 0.5 0.5 0.1
90 0.5 1.0 0.0
90 0.5 1.0 0.1
100 0.1 0.5 0.0
100 0.1 0.5 0.1
100 0.1 1.0 0.0
100 0.1 1.0 0.1
100 0.2 0.5 0.0
100 0.2 0.5 0.1
100 0.2 1.0 0.0
100 0.2 1.0 0.1
100 0.3 0.5 0.0
100 0.3 0.5 0.1
100 0.3 1.0 0.0
100 0.3 1.0 0.1
100 0.4 0.5 0.0
100 0.4 0.5 0.1
100 0.4 1.0 0.0
100 0.4 1.0 0.1
100 0.5 0.5 0.0
100 0.5 0.5 0.1
100 0.5 1.0 0.0
100 0.5 1.0 0.1

That’s 120 combinations! But suppose that we think intervention effect sizes less than 0.3 are only interesting when our sample size is greater than 70 per condition. We could then filter this out as before:

experiment_parameters <- expand_grid(
  n_per_condition = seq(from = 50, to = 100, by = 10),
  mean_intervention = seq(from = 0.1, to = 0.5, by = 0.1),
  sd = c(0.5, 1),
  mean_control = c(0, 0.1)
) |>
  filter(!(n_per_condition < 70 & mean_intervention < 0.3))

experiment_parameters |>
  nrow()
[1] 104

16 rows have been removed, and we can verify that there are no remaining combinations where n_per_condition is less than 70 when mean_intervention is less than 0.3 by using filter() to keep only those rows that would meet these criteria:

experiment_parameters |>
  filter(n_per_condition < 70 & mean_intervention < .3)
n_per_condition mean_intervention sd mean_control

Empty!

7.6 Designs with dependencies

In fully-crossed designs, the presence of variables is independent from each other: all levels in a given variable are assumed to be possible across all levels of all other variables.

In non-fully-crossed designs, we recognise that not all variable combinations are possible, and remove those impossible/implausible combinations. In other words, there are dependencies between variables: the value of one variable depends on the value of other(s).

We can actually take these kinds of dependencies a step further. Instead of just creating a fully-crossed design and filtering out the ones that are impossible or implausible, we can instead define variables in the first place in relation to others.

Let’s start with a simple example. Imagine we want to simulate all values of the width, height, and area of rectangles when width is between 10 and 20, and height is between 15 and 25. We know that area is dependent on height and width (since area = height * width). If we would use the “simulate fully-crossed and then filter” method, we’d do something like:

rectangle_parameters <- expand_grid(
  width = 10:20,
  height = 15:25,
  area = 150:500 # smallest area = 10*15 = 150, largest area = 20*25 = 500
)

rectangle_parameters |>
  nrow()
[1] 42471

This gives 42,471 possibilities. Then we could filter, like:

rectangle_parameters <- expand_grid(
  width = 10:20,
  height = 15:25,
  area = 150:500 # smallest area = 10*15 = 150, largest area = 20*25 = 500
) |>
  filter(area == width * height)

rectangle_parameters |>
  nrow()
[1] 121

Which gives us the 121 actual possibilities. But why bother simulate all of those unnecessary values in the first place, when we could instead just calculate area directly for each height and width value? This would be much more efficient. And we can do just this, using mutate(), since the output of expand_grid() is already a tibble:

rectangle_parameters <- expand_grid(
  width = 10:20,
  height = 15:25
  ) |>
  mutate(area = width * height)

rectangle_parameters |>
  nrow()
[1] 121

When we simulate our data, there will be many cases where these types of “functional relationships” exist: i.e., where one variable is defined based on its relationship with other variable(s). In this case, the most efficient approach is to (i) simulate the other “independent” variables, and then (ii) use mutate() to create the variable which is dependent on those other variables.

Here’s another example. Imagine we’re simulating exam results for students, and we want to vary:

  • how many questions the exam has (n_questions: 10, 20, 40, or 60)

  • the average difficulty of questions (“easy” or “hard”)

  • the time limit for the exam (time_limit_minutes: between 10 and 120 minutes)

A fully-crossed design would treat these as independent: any time limit could go with any number of questions. But in reality, if we simulate a 10-question quiz with easy questions we might reasonably only want to consider a 10 or 20 minute time limit, whereas a 60-question exam with hard questions might need 60–120 minutes. So the time limit we want to consider in our simulation depends on the number of questions and the difficulty.

If we use the “fully-crossed then filter” method, we might do:

exam_parameters <- expand_grid(
  n_questions = c(10, 20, 40, 60),
  difficulty = c("easy", "hard"),
  time_limit_minutes = seq(10, 120, by = 10)
)

exam_parameters |> nrow()
[1] 96

This again gives us many combinations we do not care about. We could filter() them:

exam_parameters <- expand_grid(
  n_questions = c(10, 20, 40, 60),
  difficulty = c("easy", "hard"),
  time_limit_minutes = seq(10, 120, by = 5)
) |>
  filter(
    (n_questions == 10 & time_limit_minutes %in% c(10, 20)) |
    (n_questions == 20 & time_limit_minutes %in% c(20, 30)) |
    (n_questions == 40 & time_limit_minutes %in% c(40, 60)) |
    (n_questions == 60 & time_limit_minutes %in% c(60, 120))
  )

exam_parameters 
n_questions difficulty time_limit_minutes
10 easy 10
10 easy 20
10 hard 10
10 hard 20
20 easy 20
20 easy 30
20 hard 20
20 hard 30
40 easy 40
40 easy 60
40 hard 40
40 hard 60
60 easy 60
60 easy 120
60 hard 60
60 hard 120

But this gets messy quickly. Looking above, we only filtered by n_questions, not question_difficulty, and it’s already a little hard to read. Plus, we are generating many rows just to immediately discard them.

Instead, we can define time_limit_minutes from the start based on n_questions and question_difficulty. Here, we’ll generate n_questions and difficulty, and then derive a plausible time limit with mutate().

We could, for example, assume:

  • Easy questions takes 1 minute on average, and

  • Hard questions take twice as long as easy questions on average.

exam_parameters <- expand_grid(
  n_questions = c(10, 20, 40, 60),
  difficulty = c("easy", "hard")
) |>
  mutate(
    # baseline: 1 minute per question
    base_time = n_questions * 1,
    # hard exams take twice as long per question 
    time_limit_minutes = if_else(difficulty == "hard",
                                round(base_time * 2),
                                base_time)
  ) |>
  select(-base_time)

exam_parameters
n_questions difficulty time_limit_minutes
10 easy 10
10 hard 20
20 easy 20
20 hard 40
40 easy 40
40 hard 80
60 easy 60
60 hard 120

7.7 Summary

In this chapter, we covered how to design simulation experiments by specifying the parameter values we want to vary. Key takeaways:

  • Parameter grids are tibbles where each row represents one combination of simulation parameters. They replace error-prone copy-pasting with a structured, inspectable object.

  • expand_grid() is the workhorse for creating parameter grids. You supply each parameter’s desired values once, and it returns all combinations automatically.

  • Sanity checking your grid with nrow() and distinct() helps catch mistakes early.

  • Fully-crossed designs (the expand_grid() default) include every combination of every parameter level. These are simple and interpretable, but can become large quickly and may include implausible combinations.

  • Non-fully-crossed designs start fully crossed and then use filter() to remove implausible or uninteresting combinations.

  • Designs with dependencies use mutate() to derive one parameter from others (e.g., area = width * height), avoiding the need to enumerate and then filter impossible values. This is both more efficient and more readable than the fully-crossed-then-filter approach.

In the next chapter, we will learn how to run simulations across these parameter grids efficiently using the map() family of functions.

TipPractice writing analysis functions

Ready to practice? Download and complete the exercises for this chapter.