What effect size do you intend to use? There are quite a few effect sizes available for regression models, the coefficients themselves are among them. To simulate a regression model, you probably need to assume quite a few things such as the distribution of the predictors, the coefficients and the residual variance. Once you have decided on those assumptions, the simulations themselves are not that hard.
Another possibility is using G*Power. The drawback is that it is quite restrictive concerning effect sizes.
To give a specific example of how you could use simulations to assess the power in a regression model, let's assume that the true model is as follows:
$$
\texttt{y_post}_i = 10 + 0.85\times\texttt{y_pre}_i -0.5\times\texttt{x}_{1,i} + 0.6\times\texttt{x}_{2,i} + 0.1\times\texttt{x}_{1,i}\times\texttt{x}_{2,i} + \epsilon_i
$$
Furthermore, I'm assuming the following distributions for the involved variables:
$$
\begin{array}{l|l}
\text{Variable} & \text{Distribution} \\
\hline
\texttt{y_post} & \operatorname{N}(100, 15^2) \\
\texttt{y_pre} & \operatorname{N}(115, 14^2) \\
\texttt{x1} & \operatorname{U}(10, 50) \\
\texttt{x2} & \operatorname{N}(15, 7.5^2) \\
\epsilon & \operatorname{N}(0, 25^2)
\end{array}
$$
Finally, I'm assessing the power for a sample size of $100$ using $1000$ replications.
The R
package simglm
makes it easy to set up the simulations:
library(tidyverse)
library(simglm)
library(future)
# The simulation setup
sim_args <- list(
formula = y_post ~ 1 + y_pre + x1*x2 # The formula
, fixed = list( # Regression variables
y_post = list(var_type = "continuous", dist = "rnorm", mean = 100, sd = 15)
, y_pre = list(var_type = "continuous", dist = "rnorm", mean = 115, sd = 14)
, x1 = list(var_type = "continuous", dist = "runif", min = 10, max = 50)
, x2 = list(var_type = "continuous", dist = "rnorm", mean = 15, sd = 7.5)
)
, sample_size = 100
, error = list(variance = 25^2) # Residual variance
, reg_weights = c( # The coefficients
10 # Intercept
, 0.85 # y_pre
, -0.5 # x1
, 0.6 # x2
, 0.1 # x1:x2
)
, replications = 1000 # Number of replications
, extract_coefficients = TRUE
, model_fit = list(formula = y_post ~ 1 + y_pre + x1*x2, model_function = "lm")
, power = list( # Hypothesis test for the coefficients
dist = "qt"
, alpha = 0.05
, opts = list(df = 100 - 5)
)
)
Let's perform the simulations and inspect the power:
plan(multisession, workers = 10)
res <- replicate_simulation(sim_args, future.seed = 142857) %>%
compute_statistics(sim_args, type_1_error = FALSE, precision = FALSE)
res
term avg_estimate power avg_test_stat crit_value_power replications
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 7.57 0.072 0.118 1.99 1000
2 x1 -0.481 0.175 -0.932 1.99 1000
3 x1:x2 0.0994 1 6.18 1.99 1000
4 x2 0.663 0.072 0.360 1.99 1000
5 y_pre 0.854 0.999 5.03 1.99 1000
Under these assumptions, the power for the interaction is $1$ (third column).
By using the argument vary_arguments
in the setup, we could vary the sample size (see the vignette "Simulation Argument Details for 'simglm'" of the package).