Cohen's d and multiple comparisons for 2/3-way ANOVA

Question

I am conducting three-way ANOVA (A*B*C) with 2 levels each.
1) I found A*B interaction.
2) I moved to 2-way ANOVA (A*B) and found interaction again. I reported Eta-squared and equivalent Cohen's d.
3) I moved to 1-way ANOVA (a1,a2,b1,b2) with mushed cells and bonferonni posthoc showed that b2 was lower than the a1,a2, and b1.

Question 1: In step 2, what does Cohen's d tell me?
Question 2: In step 3, did I need to divide my alpha=0.05 by what?
Question 3: Can I report Cohen's d between b2-a1 and b2-a2 and b2-b1?

thanks

Can you tell us what you're studying and the exact model, and relevant results, you fitted each time? I can't follow what you did, or why you might do it. And I have no clue what "mushed cells" are either. — Michelle, Jan 27 '12 at 18:44
I changed the title as it did not reflect the question as I understood it. Please edit if unhappy. — Henrik, Feb 08 '12 at 22:58

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

Updated: I came across an excellent post by @JeromyAnglim, and he makes a good point that standardized mean differences will be more intuitive for people, so I have made a couple of changes marked * below.

I don't think this is a valid approach. The most appropriate strategy is to fit the full model that you are interested in and stick with that. If some estimated effects are non-significant in the initial model test, so be it; you don't drop them. If you do drop them, then the subsequent tests are incorrect (e.g., the p-values that are displayed in software output are not the actual p-values). I am aware that many people follow the same strategy you are using here, and it may well be the conventional wisdom in your field, but it is still not valid.

Furthermore, if you believe that the interaction effects are real, you don't interpret main effects. To understand what is going on in your data, you interpret the simple effects (note, this doesn't mean you run a separate model fit / test on the simple effects). In general, the meaning of a parameter estimate is the effect on the response variable associated with one unit change in that factor when all other factors are held constant. However, when there is an interaction, it is not possible for a factor to change without other factors changing as well--a change in x1 necessitates a change in x1*x2. Thus, main effects are meaningless when interactions exist.

Like @onestop, I'm not sure what you mean by "mushed cells". However, I will try to answer your questions:

Cohen's d is inappropriate here. I don't believe it tells you anything meaningful (for a way to adapt Cohen's d for this situation, see #3). You should stop fitting models with the results of your initial analysis (discussed in my first paragraph), and report partial eta squareds as your effect size estimate. Eta squared is also acceptable, but partial eta squared would be better; partial eta squared (say for the AB interaction) would be $SS_{AB}/(SS_{AB}+SS_E)$. *This post provides excellent information about effect size measures for ANOVA in general, and about eta squared vs. partial eta squared in particular.
You ask about Bonferroni corrections. There are other approaches, but it doesn't appear that your contrasts are orthogonal (as you want to test all pairwise comparisons), and you state that they are post-hoc, thus, the Bonferroni approach is probably best. To do this, you divide $\alpha$ by the total number of comparisons possible; with $k$ means, this should be $k(k-1)/2$. Only those contrasts where the p-value is less than this adjusted value would be considered 'significant'. (However, I should also say that I don't see why you need to do any of this; with only 2 levels in each factor, the results from your initial model fit are all you need to know.)
In general, I'm not sure that reporting all of the possible Cohen's d's is worth while, but you could. You need to realize that those comparisons would not be independent, and thus the exact manner in which one of them is off (because these are estimated from data, they are never exactly correct) will be related to how the others are off. If the people to whom this information would be presented would not be likely to realize this automatically, it should be explicitly mentioned. *However, in your particular case, I think you should just use your full model; since you have an interaction and you want to intuitively describe the magnitude of that effect, you could pick one of the factors to condition on (e.g., A) and show how the standardized mean difference of the other changes depending on the level of that factor. For instance, you could report Cohen's d for the difference between the means of the levels of b given a1, and given a2:

$$ \frac{\bar{x}_{a1b1}-\bar{x}_{a1b2}}{SD_{pooled}} $$

$$ \frac{\bar{x}_{a2b1}-\bar{x}_{a2b2}}{SD_{pooled}} $$

Hi Gung. You wrote "If you do drop them, then the subsequent tests are incorrect (e.g., the p-values that are displayed in software output are not the actual p-values)." I know Frank Harrell talks about this alot too. But isn't it true that, with the nonsig. predictors included, the model is overfitted, such that the parameters for the sig. predictors will get distorted by noise? And isn't getting the parameters estimated accurately often more important than getting accurate p-values? Another way to say this is that parsimony will lead to a better model on crossvalidation. No? — rolando2, Jan 27 '12 at 23:59
@rolando2, you raise some nice points. Since this is an ANOVA, the addition of unrelated factors shouldn't bias the parameter estimates; the predicted values will simply be the cell means, so long as there is more than one observation per cell. That is, you need enough residual degrees of freedom for the test. There is a lot I don't know about the data, the model, etc. here, but the significant effect suggests there were adequate df. Instead, the effect of including unrelated factors should just be loss of power. — gung - Reinstate Monica, Jan 28 '12 at 05:10
I have to differ--not that one needs sufficient observations per cell, but that when I run an ANOVA (or ANCOVA or more generally, GLM) and include several nonsig. predictors, the other predictors' coefficients will be different than if the nonsig. predictors were excluded. This sort of perturbation is liable to interfere with predictive accuracy on crossvalidation. I could cite one example in which even the dropping of a number of sig. (but not strong) predictors improved crossvalidation performance dramatically. — rolando2, Jan 28 '12 at 15:55
@rolando2, I'm certainly not going to deny you've seen that. Consider the following: You have an ANOVA with 3 factors, A, B, & C, w/ 8 obs / cell; you completely make up a 4th factor, D; you randomly divide each cell in half, calling 4 cases d1, & 4 d2; now you have a true null with both cells sampled from the same pop--they have the same pop mean & SD; the predicted values will be the cell means, the sampling dist of which will be centered on the true value w/ SE=SD/2 instead of SD/sqrt(8); thus, no *bias* has been induced, rather the *variance* of the sampling dist has gone up. — gung - Reinstate Monica, Jan 28 '12 at 20:58
My first point is that there is no "distortion". Now, it is certainly true that your foremost goal might be predictive accuracy. Moreover, if you drop factors that are *truly* unrelated, this will decrease the variance of the sampling dist by collapsing cells, however, selecting on the basis of a random variable (like 'significance') *will* induce bias except when you're really lucky. I discussed these issues here: http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856 — gung - Reinstate Monica, Jan 28 '12 at 21:03
The other (more likely) possibility is that you are gaining better accuracy by making a bias-variance tradeoff. This is implied by your story of dropping *significant* factors and *still* getting better accuracy. Again, whether maximizing your predictive accuracy is the most important thing is a judgment only you can make, and I definitely think it's fine if that's what you want; I don't mean to be dogmatic about this. I disscussed the bias-variance tradeoff here: http://stats.stackexchange.com/questions/20295/what-problem-do-shrinkage-methods-solve/20303#20303 — gung - Reinstate Monica, Jan 28 '12 at 21:09
THis is helpful. Thanks a lot. I did not want to continue after I corrected some of the stuff, so posted another question. Thanks. — JonBonJovi, Feb 01 '12 at 13:58

Cohen's d and multiple comparisons for 2/3-way ANOVA

1 Answers1