Sample size formula for an F-test?

Question

I am wondering if there is a sample size formula like Lehr's formula that applies to an F-test? Lehr's formula for t-tests is $n = 16 / \Delta^2$, where $\Delta$ is the effect size (e.g. $\Delta = (\mu_1 - \mu_2) / \sigma$). This can be generalized to $n = c / \Delta^2$ where $c$ is a constant that depends on the type I rate, the desired power, and whether one is performing a one-sided or two sided test.

I am looking for a similar formula for an F-test. My test statistic is distributed, under the alternative, as a non-central F with $k,n$ degrees of freedom and non-centrality parameter $n \lambda$, where $\lambda$ depends only on population parameters, which are unknown but posited to take some value. The parameter $k$ is fixed by the experiment, and $n$ is the sample size. Ideally I am looking for a (preferably well-known) formula of the form $$n = \frac{c}{g(k,\lambda)}$$ where $c$ depends only on the type I rate and the power.

The sample size should satisfy $$ F(F^{-1}(1-\alpha;k,n,0);k,n,n\lambda) = \beta,$$ where $F(x;k,n,\delta)$ is the CDF of a non-central F with $k,n$ d.o.f. and non-centrality parameter $\delta$, and $\alpha, \beta$ are the type I and type II rates. We can assume $k \ll n$, i.e. $n$ need be 'sufficiently large.'

My attempts at fiddling with this in R have not been fruitful. I have seen $g(k,\lambda) = \lambda / \sqrt{k+1}$ suggested but the fits have not looked very good.

edit: originally I had vaguely stated that the non-centrality parameter 'depends' on the sample size. On second thought, I found that too confusing, so made the relationship clear.

Also, I can compute the value of $n$ exactly by solving the implicit equation via a root finder (e.g. Brent's method). I am looking for an equation to guide my intuition and for use as a rule of thumb.

To clarify, is it correct that you are already able to get the required $n$, but you're looking for a general formula? I would be very surprised if there is a useful general formula. — mark999, May 22 '11 at 21:44

score 3 · Answer 1 · answered Jul 09 '18 at 04:28

I am wondering if there is a sample size formula like Lehr's formula that applies to an F-test?

The webpage "Power Tools for Epidemiologists" explains:

Difference Between Two Means (Lehr):

Say, for example, you want to demonstrate a 10 point difference in IQ between two groups, one of which is exposed to a potential toxin, the other of which is not. Using a mean population IQ of 100, and a standard deviation of 20:

$$n_{group}=\frac {16}{(100−90/20)^2}$$

$$n_{group}=\frac{16}{(.5)^2}=64$$
Percentage Change in Means

Clinical researchers may be more comfortable thinking in terms of percentage changes rather than differences in means and variability. For example, someone might be interested in a 20% difference between two groups in data with about 30% variability. Professor van Belle presents a neat approach to these kinds of numbers that uses the coefficient of variation (c.v.) 4 and translating percentage change into a ratio of means.

Variance on the log scale (see chapter 5 in van Belle) is approximately equal to coefficient of variation on the original scale, so Lehr’s formula can be translated into a version that uses c.v.

$$n_{group}=\frac{16(c.v.)^2}{(ln(μ_0)−ln(μ_1))^2}$$

We can then use the percentage change as the ratio of means, where

$$r.m.=\frac{μ_0−μ_1}{μ0}=1−\frac{μ_1}{μ_0}$$

to formulate a rule of thumb:

$$n_{group}=\frac{16 (c.v.)^2}{(ln(r.m.))^2}$$

In the example above, a 20% change translates to a ratio of means of 1−.20=.80. (A 5% change would result in a ratio of means of 1−.05=.95; a 35% change 1−.35=.65, and so on.) So, the sample size for a study seeking to demonstrate a 20% change in means with data that varies about 30% around the means would be

$$n_{group}=\frac{16(.3)^2}{(ln(.8))^2}=29$$

An R function based on this rule would be:
1   nPC<-function(cv, pc){
2       x<-16*(cv)^2/((log((1-pc)))^2)
3       print(x)
4   }
Say you were interested in a 15% change from one group to another, but were uncertain about how the data varied. You could look at a range of values for the coefficient of variation:
1   a<-c(.05,.10,.15,.20,.30,.40,.50,.75,1)
2   nPC(a,.15)
You could use this to graphically display your results:
1   plot(a,nPC(a,.15),  ylab="Number in Each Group", 
2   xlab="By Varying Coefficent of Variation", 
3   main="Sample Size Estimate for a 15% Difference")

See also: iSixSigma "How to Determine Sample Size" and RaoSoft "Online Sample Size Calculator".

Sample size formula for an F-test?

1 Answers1