What is the minimum viable cell size for 2x2 ANOVA?

Question

I have a 2x2, between-subjects experimental design (2 independent variables (IVs) with 2 levels each) and one dependent variable (DV). My data are unbalanced and an interaction between the IVs seems likely, so I plan to use ANOVA with Type III Sums of Squares to test whether my DV is different across one of my IVs while controlling for any influence of the other IV (or the interaction, if present).

My question is this: what is the minimum number of data points that I need in each of my 4 cells, to ensure that ANOVA likely won't give spurious results? I'm aware of Simmons et al.'s recommendation (2011) of having at least 20 data point in each cell, but that recommendation only serves to control the test's rate of false negatives. What I'm more worried about is that with a small enough cell size, the statistics on which ANOVA is based probably won't be very reliable, and so the results of the test would be unreliable as well (in terms of either false negatives or false positives). Are there any papers or texts out there that have studied and made recommendations regarding this concern?

Related: [Is there a minimum sample size required for the $t$-test to be valid?](http://stats.stackexchange.com/q/37993/7290) — gung - Reinstate Monica, Mar 14 '14 at 21:07
@DoctorAmbient, I'll let you judge the severity. In one data set, the counts across 4 cells are 16, 17, 13, 30. In another data set, the counts across 4 cells are 13, 10, 8, 14. — Tyne, Mar 19 '14 at 10:45

gung - Reinstate Monica · Accepted Answer · 2014-03-14T17:57:55.533

9

There isn't really any absolute minimum except in a trivial sense (if you won't try to test for the interaction, the minimum $n_{ij}$ will be $1$, if you do want to test for interactions, the minimum cell size might be $2$). Instead, there are two issues here:

The first is the question of the robustness of the ANOVA to the violation of assumptions. Like all linear models (regression, $t$-tests, etc.), the ANOVA assumes that the data within each cell (i.e., the residuals) are independent, have equal (homogenous) variance, and are normally distributed. In truth, some of these assumptions are less necessary than others. For instance, with enough data you don't really need the within-cell distributions to be normal. However, what constitutes 'enough data' depends on how far from normal those distributions are. Thus, the further you are from normality, the more data you need. But there is another twist here, namely that with fewer data it is harder (or even impossible) to assess whether the assumptions of the ANOVA are met. So with fewer data per cell, you really are going by blind faith. If the assumptions are not met, then you can have increased type I error rates.
The second issue is the question of power. The probability of getting significance is a function of the size of the effect and the amount of data you have. If the effects are large enough, you will have good power even if you have only 1 datum per cell. I suspect effects that large are uncommon, though. Thus, you need to determine how large of an effect you want to be able to detect with what power (etc.), and calculate your $N$ accordingly.

edited Mar 14 '14 at 17:57

answered Mar 14 '14 at 17:03

gung - Reinstate Monica

132,789
81
357
650

Thanks @gung. I avoided mentioning robustness in my post because it seems to be an extremely loose concept. Can you provide (or link to) a detailed description of the relationship between $n_{ij}$ and the deviation of cell $ij$'s distribution from normality? As a rough example, the information that I seek might be represented as a height-map over a 2D plane, where the two dimensions of the plane are $n_{ij}$ and the deviation of cell $ij$'s distribution from normality, and the height from the plane represents the likelihood of ANOVA's type 1 error rate being inflated. – Tyne Mar 19 '14 at 10:09
I don't think robustness is a loose concept, but it is a large topic. There is a great deal of work in statistics on robustness. To understand the robustness of the ANOVA to violations of the assumptions, you will need to specify the exact nature of the violation you are interested in. There are many ways in which a distribution can be 'non-normal', for instance. In general, skew is more damaging that kurtosis, & skews in opposite directions is especially damaging to type I error rates. – gung - Reinstate Monica Mar 19 '14 at 23:33
To clarify, when you say: "the data within each cell (i.e., the residuals) are independent [...]" in your original answer, are you asserting that the data within each cell of the ANOVA *are* the residuals themselves? That's how I interpret your use of "i.e.", but I've always thought of the residuals as being collected into a single distribution that spans the entire data set (versus four distributions, in my case), which one could then check for normality. Is that wrong? – Tyne Mar 26 '14 at 19:20
(I've read the answer [here](http://stats.stackexchange.com/questions/27610/how-to-test-for-normality-in-a-2x2-anova?rq=1), but it only asserts that "normality within cells means normality of residuals". I'm concerned with the implication in the other direction. Does normality of residuals mean normality within cells?) – Tyne Mar 26 '14 at 19:24
I suppose the "i.e." is somewhat ambiguous. An ANOVA is (a special case of) a regression model. In a regression model, it is only the residuals that need to be normal (see my answer [here](http://stats.stackexchange.com/a/33320/7290)). However, in an ANOVA, the predicted value in each cell is the cell mean. Thus, the normality of the residuals = the normality of the data w/i each cell. It is possible to have normality w/i each cell individually, but not normality of all residuals together b/c of heterogeneity, however, if the residuals are normal, all cells will be too. – gung - Reinstate Monica Mar 26 '14 at 19:44
Thanks for that link, and for your continued help. Your statement about the residuals (normal residuals $\implies$ data in all cells are normal) seems to contradict the answer [here](http://stats.stackexchange.com/questions/27610/how-to-test-for-normality-in-a-2x2-anova?rq=1), which says that "for a basic between subjects factorial ANOVA, normality within cells means normality of residuals". I suppose that the author's use of "basic" might exclude the potential for heterogeneity, but that seems unlikely given that he discusses it a few sentences later. – Tyne Mar 27 '14 at 10:43
I'll leave him a comment about it. I had upvoted his answer & still think it's good, but that part is slightly incorrect. If you want to know more about this, you should ask a new question. There's only so much to be explained in comments & comments aren't where we want the information to be on this site. If my answer helped you, you might consider upvoting it, by clicking on the upwards facing normal distribution, &/or accepting it by clicking on the check mark below the vote total. – gung - Reinstate Monica Mar 27 '14 at 14:03

What is the minimum viable cell size for 2x2 ANOVA?

1 Answers1

Linked

Related