What is the effect of increasing the sample size in ANOVA?

Question

Assume that a one-way ANOVA is performed on 3-4 groups of the same size.

What is the effect of increasing the sample size (equally for all groups)?

Antoni Parellada · Answer 1 · 2016-05-05T13:58:43.827

2

Since the essential computation is the F-test:

$$F=\frac{\text{variance between treatments}}{\text{variance within treatments}}=\large\frac{\frac{\text{Sum Sqs}_{\text{treatments}}}{\text{no. treatments}-1}}{\frac{\text{Sum Sqs}_{\text{errors}}}{\text{no. cases}-\text{no. treatments}}}$$,

increasing the number of cases will decrease the denominator, and increase the $\ F$ test statistic, making it more likely to obtain a small p-value with everything else constant.

In other words, it will result in increased power, and decreased type II errors.

And following up on the comments from @whuber, the increase in the number of observations also has an effect in the mean squared residuals (MSR or RSE), and by extension on the standard error (SE) on the estimates. This is clear when considering that the standard error of the estimates is simply the square root of $\widehat{\textrm{Var}}(\hat{\mathbf{\beta}}) = \hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}$, and that the model matrix $X$ in this case of one-way ANOVA is as simple as an intercept and dummy coded entries:

with the defining value $\hat\sigma^2=\frac{u^Tu}{\text{df}}$ and $\text{df}=\text{cases}-\text{groups}-1$, which is roughly the mean of the squared residuals. Carrying out a Monte Carlo simulation by drawing three groups of observations from normal distributions with the same variance $\sigma^2=9$, but with means $x_1=10$, $x_2=15$ and $x_3=20$

with increasing numbers of balanced observations from $5$ to $1000$ we can see how the mean squared of the residuals display a funneled shape, rapidly dropping the spread of the mean squared residuals as the sample size increases:

edited May 05 '16 at 13:58

answered May 03 '16 at 20:37

Antoni Parellada

23,430
15
100
197

Although this is correct, it appears to be incomplete: you need to say something about how the sums of squares in one sample relate to the sums of squares in another sample. – whuber May 03 '16 at 20:59
@whuber Are you referring to the effect of increased subjects when the variances are different across groups? – Antoni Parellada May 03 '16 at 22:02
No--as you add to the sample, the variances will almost surely change. You implicitly assume they do not. – whuber May 03 '16 at 22:34
Let me approach it from the gut - the variations within the groups is the annoying noise that we want to get rid of. If there was no noise, there wouldn't be a need for a test... In general the more data points the less noise. So presumably the variance within treatments drops with the sample size, but this effect goes beyond the degrees of freedom... – Antoni Parellada May 03 '16 at 22:47
1

You appear to be confusing the standard deviation with the standard error! The interesting point is that although the *expected* variance remains the same in each group, regardless of sample size (and you can terminate your answer with this observation if you wish), it is of practical interest to wonder what the chances are that the variances in the new sample will be at least twice the variances of the current sample. Such a question has both Bayesian and Classical answers (in the form of prediction intervals for the variances). – whuber May 03 '16 at 23:05
@whuber Did I even come close? – Antoni Parellada May 05 '16 at 20:16
+1 The final graph is particularly interesting because it clearly shows there really is some chance that $\hat\sigma^2$ as estimated in small samples could substantially change, either up or down, when larger samples are collected. This illustrates the risk attendant to performing sample size calculations based on limited preliminary data. A good sample size calculation therefore would estimate that risk and incorporate it in its results (rather than assuming a known effect size and known variance). – whuber May 05 '16 at 20:22

What is the effect of increasing the sample size in ANOVA?

1 Answers1

Linked