I have a 2x2, between-subjects experimental design (2 independent variables (IVs) with 2 levels each) and one dependent variable (DV). My data are unbalanced and an interaction between the IVs seems likely, so I plan to use ANOVA with Type III Sums of Squares to test whether my DV is different across one of my IVs while controlling for any influence of the other IV (or the interaction, if present).
My question is this: what is the minimum number of data points that I need in each of my 4 cells, to ensure that ANOVA likely won't give spurious results? I'm aware of Simmons et al.'s recommendation (2011) of having at least 20 data point in each cell, but that recommendation only serves to control the test's rate of false negatives. What I'm more worried about is that with a small enough cell size, the statistics on which ANOVA is based probably won't be very reliable, and so the results of the test would be unreliable as well (in terms of either false negatives or false positives). Are there any papers or texts out there that have studied and made recommendations regarding this concern?