Why does a large sample size cause a significant ANOVA F-test?

Question

If I have a large sample size, e.g. 100,000 data points, I know that most significance tests are going to come back with a very small p-value unless the null hypothesis is "true on the nose." In other words, even very small effects will be seen by the test. I can understand why this is true for a t-test, since when I compute the test statistic I have to divide by $\sqrt{n}$ in the formula for the standard error, so when $n$ is large my standard error is small, and so my t-statistic is huge. Is there a similar explanation for why an ANOVA F-test (let's say 1-way ANOVA) is likely to be significant when $n$ is large?

I'm asking so I can better explain things to my Stat 2 class. When asked in class today, the explanation I tried was that, when $n$ is huge $MSE$ is going to be very small (because it's $SSE/(n-k)$), so the $F$-statistic will be huge. The students followed up by asking why the large df in the $F$-statistic doesn't account for this and so give reasonable $p$-values even for very large $F$-statistics (rather than ultra small $p$-values as we've been seeing in our examples).

I know, of course, that for a two-sample t-test $F = t^2$, so I can deduce significance as a special case of the reasoning above, but I'm more interested in the general case of more than 2 groups, and an explanation that doesn't require the derivation that $F = t^2$. Any help would be much appreciated. Thanks!

This question, generalized and posed slightly differently, appears at http://stats.stackexchange.com/questions/2516. The common spirit is to ask why having more data gives one more power to reject a false null hypothesis. So, rather than focusing on the F-test itself, you might consider discussing this general issue: your students might learn much more for the same effort. — whuber, Mar 31 '16 at 18:16
@whuber. I read that thread just before posting, but could not distill from it an explanation that would satisfy my students. Still, it inspired me to prepare a class on effect size that I'm delivering tomorrow. By the way, I'm a big fan of your answers here. Thanks for posting! — David White, Mar 31 '16 at 22:59

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

To use the usual arguments why small effects mean low p-values if the sample size is large (like in the link provided by @whuber), you need some measure of effect size in ANOVA. A simple one (which also works for the general normal linear model) is the sample R-squared $R^2$. It measures the proportion of explained variability in the response accounted for by the covariables (in one-way ANOVA the grouping factor) and estimates some "true" proportion $\theta$.

So we can say: For large samples, even if $R^2$ is close to zero, the F-test will provide a small p-value.

Illustration by simulation:

# Input
set.seed(20)
n <- 1000000
x <- sample(LETTERS[1:3], n, replace = TRUE)
y <- 2 + 0.01 * (x == "B") - 0.01 * (x == "C") + rnorm(n)
fit <- lm(y ~ x)

summary(fit)

# Output (partial)
Residual standard error: 1.001 on 999997 degrees of freedom
Multiple R-squared:  9.653e-05, Adjusted R-squared:  9.453e-05 
F-statistic: 48.27 on 2 and 999997 DF,  p-value: < 2.2e-16

Illustration by math: The F-statistic is a simple function of $R^2$ and one can show that under the assumptions of the normal linear model and under the null hypothesis of the F-test, $R^2$ has a beta distribution (see What is the distribution of $R^2$ in linear regression under the null hypothesis? Why is its mode not at zero when $k>3$?) with $$ E(R^2) = \frac{k-1}{n-1} $$ and $$ Var(R^2) = \frac{1}{4(n-k+1)} $$ ($k$ is the number of parameters of the model, e.g. the number of groups in a 1-way-ANOVA). So working with $R^2$ and the beta distribution is equivalent to working with F-statistic and F-distribution. For large $n$ and fixed $k$, the beta distribution above concentrates about 0, leading to a small p-value for any non-zero observed $R^2$.

edited Apr 13 '17 at 12:44

Community

1

answered Mar 31 '16 at 19:53

Michael M

10,553
5
27
43

I also read about $\eta^2$ as a measurement of effect size for ANOVA. Is $R^2$ preferred over $\eta^2$? How about $\omega^2$? Thanks! – David White Mar 31 '16 at 23:00
Eta-squared is the same as R-squared, so it is up to you. No idea if a test on omega-squared is equivalent to a test on R-squared... – Michael M Apr 01 '16 at 06:20
Okay, so I agree that as $n \to \infty$, I expect $R^2 \to 0$. So SSM/SST is going to zero, hence SSE/SST must be going to 1. Wouldn't this cause an F-statistic that goes to zero? But in practice with these large datasets, we've been seeing large F statistics, not small ones. The class and I thought it had to do with the large degrees of freedom of the denominator. Are our findings at odds with your answer? – David White Apr 01 '16 at 11:00
Under the null hypothesis, the expectation of the corresponding F-statistic approaches 1 for large $n$, so something in your reasoning might be wrong. A large F-statistic means strong evidence against the null. But is it really a strong effect? This is the tricky part and the reason why I was going via R-squared. – Michael M Apr 01 '16 at 11:58
Thanks. One more question. In your simulation, the value of the test statistic is 48.27. Each time I make n larger by a degree of magnitude, F gets larger by a degree of magnitude. Is there a theorem to justify that? I already don't see why it's true for a t-distribution. The test statistic in that case is $t = (x-\overline{x})/(s/\sqrt{n})$, but as $n$ increases both $s$ and $\sqrt{n}$ increase. Why should the denominator be going to zero (which seems necessary for making the test statistic go to $\infty$)? Thanks! – David White Apr 01 '16 at 16:58

Why does a large sample size cause a significant ANOVA F-test?

1 Answers1