What method is appropriate to use for determining a sample size for one sample t-test? How can I conduct the power analysis in this case? I am looking for the analytical formula that would allow me to compute the required N by hand.
1 Answers
This is a fairly conventional problem. You just need to decide on the effect size you want to be able to differentiate from $0$. That is, how many standard deviations separate the mean of your data from the null (reference) value in your alternative hypothesis? Once you have figured that out, you can use standard methods. The free G*Power software will be very convenient to use.
Below, I determine the required $N$ to differentiate a mean of $145$, with a posited standard deviation of $20$, from a null value of $135$ (i.e., $d=.5$). I am using $\alpha=.05$ with $80\%$ power and intend to run a two-tailed test, all of which is very conventional.
I will need $34$ data points.
Update: I have not found a formula for the power of a t-test that you could use to calculate $N$ by hand. The formula for a one-sample $z$-test (i.e., the standard deviation is known without error) is:
$$
N = \left(\frac{z_\alpha - z_\beta}{\delta}\right)^2
$$
where:
- $z_\alpha$ is the critical value of $z$ for significance (for a two-tailed test, with $\alpha = .05$, this would be the quantile $\Phi^{-1}(.975) = 1.956$)
- $z_\beta$ is the normal quantile for $1 - \rm power$
- $\delta$ is the difference between the mean under the alternative hypothesis and the null value, divided by the standard deviation
This cannot be done with just paper and pencil, but if you had access to a $z$-table, it could be done. Using your values, that rounds up to $N = 31$ (see below). If the suggested number of data were large enough, the normal approximation would be fine; here we are on the border, so we want to take the uncertainty of the standard deviation into account in the resulting test. That is, you will use a $t$-test in the end, not a $z$-test, so you would have a little less power than that. We can try plugging the same values in and using quantiles of the $t$-distribution instead of the $z$-distribution, but we need the degrees of freedom for the $t$-distribution. We can try using $df = 30$, since that falls out of the calculation above. That yields $N = 33, df= 32$, so we didn't use the correct $t$-distribution. If we iterate and try again, the calculation is stable. This is an iterative process, but doable by hand (with a sufficiently detailed $t$-table), and agrees well with the value from G*Power above. Simulation suggests this value rounds to $80\%$ power, but is just slightly below, while $N = 34$ is just slightly above.
za = qnorm(.975); za # [1] 1.959964
zb = qnorm(.2); zb # [1] -0.8416212
n = ( (za-zb)/.5 )^2; n # [1] 30.30848
ta = qt(.975, df=30); ta # [1] 2.042272
tb = qt(.2, df=30); tb # [1] -0.8537673
n = ( (ta-tb)/.5 )^2; n # [1] 33.54818
ta = qt(.975, df=32); ta # [1] 2.036933
tb = qt(.2, df=32); tb # [1] -0.8529985
n = ( (ta-tb)/.5 )^2; n # [1] 33.40682
library(binom)
set.seed(4116)
p = vector(length=100000)
for(i in 1:100000){
d = rnorm(33, mean=145, sd=20)
p[i] = t.test(d, mu=135)$p.value
}
mean(p<.05) # [1] 0.79573
binom.confint(79573, 100000, methods="exact")
# method x n mean lower upper
# exact 79573 1e+05 0.79573 0.7932176 0.7982252
set.seed(4116)
for(i in 1:100000){
d = rnorm(34, mean=145, sd=20)
p[i] = t.test(d, mu=135)$p.value
}
mean(p<.05) # [1] 0.8093
binom.confint(80930, 100000, methods="exact")
# method x n mean lower upper
# exact 80930 1e+05 0.8093 0.8068512 0.8117309

- 1,506
- 3
- 5
- 16

- 132,789
- 81
- 357
- 650
spit into the windcounteract this. – gung - Reinstate Monica Apr 15 '15 at 16:22