Updated: I came across an excellent post by @JeromyAnglim, and he makes a good point that standardized mean differences will be more intuitive for people, so I have made a couple of changes marked * below.
I don't think this is a valid approach. The most appropriate strategy is to fit the full model that you are interested in and stick with that. If some estimated effects are non-significant in the initial model test, so be it; you don't drop them. If you do drop them, then the subsequent tests are incorrect (e.g., the p-values that are displayed in software output are not the actual p-values). I am aware that many people follow the same strategy you are using here, and it may well be the conventional wisdom in your field, but it is still not valid.
Furthermore, if you believe that the interaction effects are real, you don't interpret main effects. To understand what is going on in your data, you interpret the simple effects (note, this doesn't mean you run a separate model fit / test on the simple effects). In general, the meaning of a parameter estimate is the effect on the response variable associated with one unit change in that factor when all other factors are held constant. However, when there is an interaction, it is not possible for a factor to change without other factors changing as well--a change in x1 necessitates a change in x1*x2. Thus, main effects are meaningless when interactions exist.
Like @onestop, I'm not sure what you mean by "mushed cells". However, I will try to answer your questions:
- Cohen's d is inappropriate here. I don't believe it tells you anything meaningful (for a way to adapt Cohen's d for this situation, see #3). You should stop fitting models with the results of your initial analysis (discussed in my first paragraph), and report partial eta squareds as your effect size estimate. Eta squared is also acceptable, but partial eta squared would be better; partial eta squared (say for the AB interaction) would be $SS_{AB}/(SS_{AB}+SS_E)$. *This post provides excellent information about effect size measures for ANOVA in general, and about eta squared vs. partial eta squared in particular.
- You ask about Bonferroni corrections. There are other approaches, but it doesn't appear that your contrasts are orthogonal (as you want to test all pairwise comparisons), and you state that they are post-hoc, thus, the Bonferroni approach is probably best. To do this, you divide $\alpha$ by the total number of comparisons possible; with $k$ means, this should be $k(k-1)/2$. Only those contrasts where the p-value is less than this adjusted value would be considered 'significant'. (However, I should also say that I don't see why you need to do any of this; with only 2 levels in each factor, the results from your initial model fit are all you need to know.)
- In general, I'm not sure that reporting all of the possible Cohen's d's is worth while, but you could. You need to realize that those comparisons would not be independent, and thus the exact manner in which one of them is off (because these are estimated from data, they are never exactly correct) will be related to how the others are off. If the people to whom this information would be presented would not be likely to realize this automatically, it should be explicitly mentioned. *However, in your particular case, I think you should just use your full model; since you have an interaction and you want to intuitively describe the magnitude of that effect, you could pick one of the factors to condition on (e.g., A) and show how the standardized mean difference of the other changes depending on the level of that factor. For instance, you could report Cohen's d for the difference between the means of the levels of b given a1, and given a2:
$$
\frac{\bar{x}_{a1b1}-\bar{x}_{a1b2}}{SD_{pooled}}
$$
$$
\frac{\bar{x}_{a2b1}-\bar{x}_{a2b2}}{SD_{pooled}}
$$