2

I have a statistics interpretation question. I've recently performed a two-way anova to identify an interaction term between my categorical independent variables (genotype + temperature) that influences my continuous dependent variable (speed). My hypothesis is that genotype+temperature strongly interact to reduce speed.

In my case, the interaction term is significant - ie., genotype and temperature significantly interact, and I can easily observe that speed is reduced in this case.

I am now directed by common practice to perform a post-hoc test, but I don't really understand why:

What post-hoc test is appropriate to help confirm the interaction between temp and genotype? I am not interested in anything except the interaction, as I consider all of the other conditions to be controls.

That said, is such a post-hoc test necessary, given that the ANOVA itself is designed to (and succeeds) in revealing a significant interaction? Is it necessary to determine the sign of the interaction?

Bonus points if you can point me to a nonparametric test for this, as my data is not gaussian (but my large N=100+ helps me be comfortable with using the data in the ANOVA itself).

Alex
  • 21
  • 1
  • 3
  • How many levels do you have for each of your 2 IVs? In what sense are your data not Gaussian? – gung - Reinstate Monica May 07 '13 at 21:56
  • I have two levels for each IV; FWIW, this is a genetic manipulation driving a temperature sensitive manipulation. So I have a temperature control (temp+ in the control genotype) and a genotype control (temp- in the test genotype). In an ideal world, all three controls (--, +-, -+) should be equivalent; in practice, they are not. – Alex May 07 '13 at 22:08
  • If you only have 2 groups, there's no point in doing post-hoc tests to see which groups differ--there's only 1 possibility. Your 'low' group must differ from your 'high' group in order for the ANOVA to have been significant. In what way are your data not Gaussian? – gung - Reinstate Monica May 07 '13 at 22:45
  • When you have factors with more than two levels, a significant ANOVA result points to at least *some* non-zero differences on whatever thing you were testing, but don't say where those differences arose. *Post hoc* tests are to try to identify where the differences that contributed to the significance lie. – Glen_b May 07 '13 at 23:09
  • The data is right-tailed with a strong floor effect. It looks roughtly like a (rather) heavy-tailed exponential distribution. – Alex May 08 '13 at 02:20
  • Just to be clear, it's the interaction term I'm most interested in. I think the responses by Glen_b and @gung are for analysis of main factors, not the interaction itself. Am I reading you guys correctly? – Alex May 08 '13 at 18:15
  • I recognize that you're mostly interested in the interaction. Given that, & the fact that this is 2x2, I see no point in post-hoc contrasts. You found a significant interaction, which is what you wanted to know; you're done. Re: your dist, is this for Y ignoring group, or do you find this dist w/i each group? (Nb, only the latter matters--see: [what-if-residuals-are-normally-distributed-but-y-is-not](http://stats.stackexchange.com/questions/12262//33320#33320).) If you find this w/i the groups, you could try a transformation. Alternatively, your N may be large enough that it doesn't matter. – gung - Reinstate Monica May 08 '13 at 19:34

1 Answers1

3

A relatively unknown but very useful nonparametric substitute for two-way ANOVA with replication (must be balanced ANOVA) is the Schierer-Ray-Hare test. It is an extension of the Kruskal-Wallis test. Do it this way:

  1. Replace each data observation with its overall rank (lowest number is ranked 1 and tied observations are all given the average rank)
  2. Run the two-way ANOVA as usual with the ranks instead of the actual data values.
  3. Discard the MS, F, and p value terms in the ANOVA output.
  4. Sum SS for SS factors, SS interaction, and SS error. Divide this sum by df total. The result is MS total.
  5. The test statistic, H, for each factor and interaction equals its SS / MS total
  6. The Excel formula for the p value for each is: CHIDIST(H, df). The df is the usual df for each factor and interaction. The Excel output provides these df figures.

The Schierer-Ray-Hare test is a lot less powerful that regular two-way ANOVA. The p values are usually around twice as large on the SRH test as those generated by two-factor ANOVA.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • I had never heard of SRH, @Mark Harmon, thanks for this & welcome to CV. – gung - Reinstate Monica Sep 19 '13 at 17:05
  • You got it bro! Actually the p Values from the SRH test might be 10X larger than p values from the ANOVA test. – Mark Harmon Sep 19 '13 at 18:46
  • You might also note that the Kruskal-Wallis test is a special case of ordinal logistic regression, which *can* handle multiple factors, continuous variables, interactions, etc. In my comments to the Q above, I note that regular ANOVA may be fine for this problem, however. – gung - Reinstate Monica Sep 19 '13 at 19:01
  • Well, the Kruskal-Wallis test can only tell if there is a significant difference between ranked data groups of only one factor. Logistic regression, whether binary, ordinal, or nominal, is a completely different animal than the nonparametric Kruskal-Wallis test. Regular (I guess you mean two-way with replication) ANOVA requires that all groups within each F test have similar variances and be normally distributed. The original question mentions that the data is not Gaussian so the parametric ANOVA test would not be the correct test. ANOVA is however somewhat robust against non-normality. – Mark Harmon Sep 19 '13 at 19:47
  • The Kruskal-Wallis test operates over ranks & ordinal logistic regression predicts the probability of a given ordinal category, which can be understood as a rank. The KW test can be shown to be a special case of OLR. As for ANOVA, what it requires is that the sampling distribution of the group means be normal. This is guaranteed if the w/i group distributions of the data are normal, however will occur w/ non-normal data w/ sufficient N via the central limit theorem. It is not clear if ANOVA is inappropriate in this case. – gung - Reinstate Monica Sep 19 '13 at 19:55