jepusto · lmiratrix · Jan 10, 2026 · Feb 1, 2026 · Feb 1, 2026 · Feb 12, 2026
diff --git a/015-Case-study-ANOVA.Rmd b/015-Case-study-ANOVA.Rmd
@@ -232,7 +232,7 @@ For instance, instead of using `rep()` to do it all at once, we could generate o
 Do not worry about trying to writing code the "best" way---especially when you are initially putting a simulation together.
 If you can find a way to accomplish your task at all, then that's often enough (and you should feel good about it!).
 
-### Now make a function
+### Now make a function {#Welch-DGP}
 
 Because we will need to generate datasets over and over, we will wrap our code in a function.
 The inputs to the function will be the parameters of the model that we specified at the very beginning: the set of population means `mu`, the population variances `sigma_sq`, and sample sizes `sample_size`. We make these quantities arguments of the data-generating function so that we can make datasets of different sizes and shapes:
@@ -470,9 +470,9 @@ ANOVA_Welch_F_long <- function(data) {
 ANOVA_Welch_F_long(sim_data)
 ```
 
-Modify `ANOVA_Welch_F()` to return output in this format, update your simulation code, and then use `group_by()` plus `summarise()` to calculate rejection rates of both tests. 
-`group_by()` is a method for dividing your data into distinct groups and conducting an operation on each. 
-The classic form of this would be something like the following:
+Modify `ANOVA_Welch_F()` to return output in the long format, update your simulation code, and then use `group_by()` plus `summarise()` to calculate rejection rates of both tests. 
+The `group_by()` method divides your data into distinct groups so you can easily conduct an operation on each. 
+The classic form of this would be something such as the following:
 
 ```{r, eval=FALSE}
 sres <- 
@@ -483,7 +483,9 @@ sres <-
 
 ### Other tests {#BF-other-tests}
 
-The `onewaytests` package in R includes functions for calculating Brown and Forsythe's $F^*$ test and James' test for differences in population means. Modify the data analysis function `ANOVA_Welch_F` (or, better yet, `ANOVA_Welch_F_long` from Exercise \@ref(BF-wide-long)) to also include results from these hypothesis tests. Re-run the simulation to estimate the type-I error rate of all four tests under Scenarios A and B of Table \@ref(tab:BF-Scenarios).
+The `onewaytests` package in R includes functions for calculating Brown and Forsythe's $F^*$ test and James' test for differences in population means.
+Modify the data analysis function `ANOVA_Welch_F` (or, better yet, `ANOVA_Welch_F_long` from Exercise \@ref(BF-wide-long)) to also include results from these hypothesis tests.
+Re-run the simulation to estimate the type-I error rate of all four tests under Scenarios A and B of Table \@ref(tab:BF-Scenarios).
 
 ### Methodological extensions
 

diff --git a/020-Data-generating-models.Rmd b/020-Data-generating-models.Rmd
diff --git a/030-Estimation-procedures.Rmd b/030-Estimation-procedures.Rmd
diff --git a/035-running-simulation.Rmd b/035-running-simulation.Rmd
diff --git a/040-Performance-criteria.Rmd b/040-Performance-criteria.Rmd
diff --git a/070-experimental-design.Rmd b/070-experimental-design.Rmd
@@ -201,7 +201,6 @@ saveRDS( res, file = "results/simulation_CRT.rds" )
 
 ```{r secret_run_full_CRT, include=FALSE}
 if ( !file.exists( "results/simulation_CRT.rds" ) ) {
-
   source( here::here( "case_study_code/clustered_data_simulation_runner.R" ) )
 } else {
   res = readRDS("results/simulation_CRT.rds")
@@ -266,9 +265,44 @@ sres <-
 glimpse( sres )
 ```
 
+Either way, we now have a lot of results to wade through.
+After we calculate our performance measures, we have 810 rows of results across 270 different scenarios.
+We can simply aggregate across everything to get overall comparisons:
+
+```{r}
+sres %>%
+  group_by( method ) %>%
+  summarise( mean_bias = mean( bias ),
+             mean_SE = mean( SE ),
+             mean_rmse = mean( rmse ) )
+```
+
+Even with this massive simplification, we see tantalizing hints of differences between the methods.
+Linear Regression does seem biased, on average, across the scenarios considered.
+But so does multilevel modeling---we did not see bias before, but apparently for some scenarios it is biased as well.
+Overall SE and RMSE seems roughly the same across scenarios, and it is clear that uncertainty, on average, trumps bias.
+This could be due, however, to those scenarios with few clusters that are small in size.
+In order to understand the more complete store, we need to dig further into the results.
+
+
+
+
 <!-- JEP: Do we need a brief conclusion to the section? -->
+<!-- LWM: How is the above? -->
+
+## Conclusions
+
+In this chapter, we have moved from isolated simulation scenarios to fully specified multifactor simulation designs.
+By systematically varying multiple features of a data-generating process while holding others fixed, we can create a structured way to explore how estimator performance depends on the context in which the estimators are used.
+By treating simulations as designed experiments, we gain a principled framework for choosing parameters and curating our set of scenarios to explore.
+The result is a rich collection of simulation output that can capture bias, variability, uncertainty estimation, or testing behavior across a wide range of plausible conditions.
+
+However, the volume and complexity of these results also create a new challenge: interpretation.
+With many factors, levels, and performance measures, it is no longer possible to understand results by inspecting tables alone.
+The next step is therefore to summarize, aggregate, and visualize these outcomes in ways that reveal patterns and trade-offs.
+In the next few chapters, we therefore turn to the problem of analyzing and presenting such results, with an emphasis on graphical approaches that clarify how performance varies across conditions.
+
 
-<!-- JEP: Do we need a conclusions section for the overall chapter? -->