13

I am wondering if there is a sample size formula like Lehr's formula that applies to an F-test? Lehr's formula for t-tests is $n = 16 / \Delta^2$, where $\Delta$ is the effect size (e.g. $\Delta = (\mu_1 - \mu_2) / \sigma$). This can be generalized to $n = c / \Delta^2$ where $c$ is a constant that depends on the type I rate, the desired power, and whether one is performing a one-sided or two sided test.

I am looking for a similar formula for an F-test. My test statistic is distributed, under the alternative, as a non-central F with $k,n$ degrees of freedom and non-centrality parameter $n \lambda$, where $\lambda$ depends only on population parameters, which are unknown but posited to take some value. The parameter $k$ is fixed by the experiment, and $n$ is the sample size. Ideally I am looking for a (preferably well-known) formula of the form $$n = \frac{c}{g(k,\lambda)}$$ where $c$ depends only on the type I rate and the power.

The sample size should satisfy $$ F(F^{-1}(1-\alpha;k,n,0);k,n,n\lambda) = \beta,$$ where $F(x;k,n,\delta)$ is the CDF of a non-central F with $k,n$ d.o.f. and non-centrality parameter $\delta$, and $\alpha, \beta$ are the type I and type II rates. We can assume $k \ll n$, i.e. $n$ need be 'sufficiently large.'

My attempts at fiddling with this in R have not been fruitful. I have seen $g(k,\lambda) = \lambda / \sqrt{k+1}$ suggested but the fits have not looked very good.

edit: originally I had vaguely stated that the non-centrality parameter 'depends' on the sample size. On second thought, I found that too confusing, so made the relationship clear.

Also, I can compute the value of $n$ exactly by solving the implicit equation via a root finder (e.g. Brent's method). I am looking for an equation to guide my intuition and for use as a rule of thumb.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
shabbychef
  • 10,388
  • 7
  • 50
  • 93
  • 1
    To clarify, is it correct that you are already able to get the required $n$, but you're looking for a general formula? I would be very surprised if there is a useful general formula. – mark999 May 22 '11 at 21:44

1 Answers1

3

I am wondering if there is a sample size formula like Lehr's formula that applies to an F-test?

The webpage "Power Tools for Epidemiologists" explains:

  • Difference Between Two Means (Lehr):

    Say, for example, you want to demonstrate a 10 point difference in IQ between two groups, one of which is exposed to a potential toxin, the other of which is not. Using a mean population IQ of 100, and a standard deviation of 20:

    $$n_{group}=\frac {16}{(100−90/20)^2}$$

    $$n_{group}=\frac{16}{(.5)^2}=64$$

  • Percentage Change in Means

    Clinical researchers may be more comfortable thinking in terms of percentage changes rather than differences in means and variability. For example, someone might be interested in a 20% difference between two groups in data with about 30% variability. Professor van Belle presents a neat approach to these kinds of numbers that uses the coefficient of variation (c.v.) 4 and translating percentage change into a ratio of means.

    Variance on the log scale (see chapter 5 in van Belle) is approximately equal to coefficient of variation on the original scale, so Lehr’s formula can be translated into a version that uses c.v.

    $$n_{group}=\frac{16(c.v.)^2}{(ln(μ_0)−ln(μ_1))^2}$$

    We can then use the percentage change as the ratio of means, where

    $$r.m.=\frac{μ_0−μ_1}{μ0}=1−\frac{μ_1}{μ_0}$$

    to formulate a rule of thumb:

    $$n_{group}=\frac{16 (c.v.)^2}{(ln(r.m.))^2}$$

    In the example above, a 20% change translates to a ratio of means of 1−.20=.80. (A 5% change would result in a ratio of means of 1−.05=.95; a 35% change 1−.35=.65, and so on.) So, the sample size for a study seeking to demonstrate a 20% change in means with data that varies about 30% around the means would be

    $$n_{group}=\frac{16(.3)^2}{(ln(.8))^2}=29$$

An R function based on this rule would be:

1   nPC<-function(cv, pc){
2       x<-16*(cv)^2/((log((1-pc)))^2)
3       print(x)
4   }

Say you were interested in a 15% change from one group to another, but were uncertain about how the data varied. You could look at a range of values for the coefficient of variation:

1   a<-c(.05,.10,.15,.20,.30,.40,.50,.75,1)
2   nPC(a,.15)

You could use this to graphically display your results:

1   plot(a,nPC(a,.15),  ylab="Number in Each Group", 
2   xlab="By Varying Coefficent of Variation", 
3   main="Sample Size Estimate for a 15% Difference")

See also: iSixSigma "How to Determine Sample Size" and RaoSoft "Online Sample Size Calculator".

Rob
  • 2,050
  • 1
  • 6
  • 23