13

Following to the recent questions we had here.

I was hopping to know if anyone had come across or can share R code for performing a custom power analysis based on simulation for a linear model?

Later I would obviously like to extend it to more complex models, but lm seems to right place to start.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Tal Galili
  • 19,935
  • 32
  • 133
  • 195

3 Answers3

4

I'm not sure you need simulation for a simple regression model. For example, see the paper Portable Power, by Robert E. Wheeler (Technometrics , May, 1974, Vol. 16, No. 2). For more complex models, specifically mixed effects, the pamm package in R performs power analyses through simulations. Also see Todd Jobe's post which has R code for simulation.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ars
  • 12,160
  • 1
  • 36
  • 54
3

Here are a few sources of simulation code in R. I'm not sure if any specifically address linear models, but perhaps they provide enough of an example to get the gist:

There's another couple of examples of simulation at the following sites:

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
1

Adapted from Bolker 2009 Ecological Models and Data in R. You need to declare the strength of the trend (i.e slope) you wish to test. Intuitively a strong trend and low variability will require a small sample size, a weak trend and large variability will require a large sample size.

a = 2  #desired slope
b = 1  #estimated intercept
sd = 20  #estimated variability defined by standard deviation
nsim = 400  #400 simulations
pval = numeric(nsim)  #placeholder for the second for loop output
Nvec = seq(25, 100, by = 1)  #vector for the range of sample sizes to be tested
power.N = numeric(length(Nvec))   #create placeholder for first for loop output
for (j in 1:length(Nvec)) {
  N = Nvec[j]  
  x = seq(1, 20, length = Nvec[j])  #x value length needs to match sample size (Nvec) length
  for (i in 1:nsim) {   #for this value of N, create random error 400 times
    y_det = a + b * x
    y = rnorm(N, mean = y_det, sd = sd)
    m = lm(y ~ x)
    pval[i] = coef(summary(m))["x", "Pr(>|t|)"]  #all the p values for 400 sims
  }  #cycle through all N values
  power.N[j] = sum(pval < 0.05)/nsim  #the proportion of correct p-values (i.e the power)
}
power.N
plot(Nvec, power.N)  #need about 90 - 100 samples for 80% power

You can also simulate for what is the minimum trend you could test for a given sample size, as shown in the book

bvec = seq(-2, 2, by = 0.1)
power.b = numeric(length(bvec))
for (j in 1:length(bvec)) {
  b = bvec[j]
   for (i in 1:nsim) {
     y_det = a + b * x
     y = rnorm(N, mean = y_det, sd = sd)
     m = lm(y ~ x)
     pval[i] = coef(summary(m))["x", "Pr(>|t|)"]
     }
   power.b[j] = sum(pval < 0.05)/nsim
  }
 power.b
 plot(bvec, power.b)
Tom
  • 216
  • 1
  • 9