Why do the results of a MANOVA change when the order of the predictor variables is changed?

Asked Dec 12 '15 at 16:41

Active May 11 '16 at 08:53

Viewed 1,308 times

5

So, for example, using the Iris data and treating iris species as the predictor variable and sepal length, sepal width, petal length, and petal width as the dependent variables we get MANOVA output that looks like this:

set.seed(2)
# Creating a matrix of the 4 dependent variables (DVs)
Y <- as.matrix(iris[,c(1:4)]) 

# MANOVA looking at the effect of species on DVs
summary(manova(Y ~ iris$Species))
    #                 Df  Pillai    approx F  num Df  den Df    Pr(>F)    
    # iris$Species    2   1.1919    53.466    8       290       < 2.2e-16 ***

That seems to make sense. Species has a significant effect on our DVs. Now what if we add another predictor variable (a random one which we shouldn’t expect to have an effect on the DVs)?

# Creating a random dummy variable to be used as a predictor variable
iris$random.dummy <- sample(x = c(0,1), size = 150, replace = TRUE)

# MANOVA looking at the effect of species + our random dummy on DVs
summary(manova(Y ~ iris$Species + iris$random.dummy))
    #                     Df  Pillai    approx F  num Df  den Df  Pr(>F)    
    # iris$Species        2   1.19339   53.263    8       288     <2e-16 ***
    # iris$random.dummy   1   0.03784   1.406     4       143     0.2349

That also seems to make sense. Species is significant still, but our random dummy variable is not. Now what if we simply switch the order of those variables?

# Switching the order of our two predictor variables in the formula
summary(manova(Y ~ iris$random.dummy + iris$Species))
    #                     Df  Pillai    approx F  num Df  den Df  Pr(>F)  
    # iris$random.dummy   1   0.13031   5.357     4       143     0.0004764 ***
    # iris$Species        2   1.19526   53.470    8       288     < 2.2e-16 ***

Now, the Pillai’s trace and approximate F-values change and our random dummy variable has become significant.

So my questions are these.

Why do the results of a MANOVA change when the order of the predictor variables is changed?

and

What does this mean for those of us trying to use and interpret a MANOVA?

edited Dec 13 '15 at 01:11

amoeba

93,463
28
275
317

asked Dec 12 '15 at 16:41

Angela

500
5
16

1

The answer here http://stats.stackexchange.com/questions/11127 is very relevant and explains large parts of this conundrum. Does not explain it completely though: I don't understand why the outcome of `summary(manova(Y ~ iris$random.dummy + iris$Species))` differs from `summary(manova(Y ~ iris$random.dummy))`. – amoeba Dec 12 '15 at 22:21
Update: it's because the error term is not the same. I added [anova] tag to your question, because this issue is not specific to MANOVA. – amoeba Dec 13 '15 at 01:12
1

Note that there isn’t balance in the species and random factors (`table(iris$random.dummy, iris$Species)`). ANOVA and MANOVA for *unbalanced* data will always be problematic. If you instead generate random data that are balanced across species (`c(sample(rep(0:1, each=25)), sample(rep(0:1, each=25)), sample(rep(0:1, each=25)))`), you will not have the problem of inconsistent results for the random term from the three models (including one without `Species`). (Because of the error term, the results will not be identical, but they will be very similar, and usually not statistically significant.) – Karl Ove Hufthammer Dec 13 '15 at 10:28
Just to clarify my earlier comment: With balanced data, you *will* get identical results for the effect of `random.dummy` regardless of the order of variables, but you will *not* get identical results (though very similar) if you exclude the `Species` variable. For unbalanced data, expect (possibly) very different results for `random.dummy` for the three models. This demonstrates how difficult it is to correctly interpret the results of (M)ANOVA models for unbalanced data. – Karl Ove Hufthammer Dec 13 '15 at 10:35
@Karl, isn't it not more and not less difficult than to interpret results of multiple regression with non-orthogonal predictors? And they are almost never orthogonal, and people still use it just fine. Unbalanced data in (M)ANOVA leads to predictors not being orthogonal. – amoeba Dec 13 '15 at 13:06
@amoeba, thank you for the link. It was helpful and generated questions that I didn’t realize I needed to ask myself--like the difference between (and really, even the existence of) the different MANOVA approaches. – Angela Dec 13 '15 at 18:12
@Karl, aha! Thank you. I had read about the issue with unbalanced data in the past and not really understood it. Now though, I wonder how this issue relates to the order of a continuous predictor variable in a MANCOVA situation. When I create `iris$random.continuous – Angela Dec 13 '15 at 18:14
1

Now that I have had the chance to take a closer look at the link amoeba shared, I see the second issue has to do with how the restricted vs. unrestricted models are set up in base R. The car package MANOVA function gave me the same output regardless of order. Thanks all for the assistance! – Angela Dec 14 '15 at 18:22

1 Answers1

1

I read this some weeks ago:

MANOVA uses a regression approach to analysis of variance. More particularly, the program employs a hierarchical model. There is an important consequence for the user: if a MANOVA execution involves more than 1 factor variable, and if there are disproportionate number of cases in the cells formed by the cross-classification of the factors, then consideration must be given to the order in which factor variables are specified. Disproportionality of subclass numbers confounds the main effects and the researcher must choose the order in which the confounded effects should be eliminated. When using MANOVA, this choice is accomplished by the order in which factor variables are specified. When using standard ordering, variables early in the specification have the effects of later variables removed, e.g. the first listed effect will be tested with all other main effects eliminated. The general rule is that each test eliminates effects listed before it on the test name specifications and ignores effects listed afterward. For a standard two-way analysis, the interaction term is not affected by the order of factor variables; more generally, for a standard n-way analysis, the n-th order interaction term and that term only, is unaffected. The problem exists for both univariate and multivariate analysis.

answered Dec 12 '15 at 17:04

stochazesthai

4,616
2
18
26

3

Please give a reference for your quotation. – John Dec 13 '15 at 03:31
1

Thank you for the excerpt. It’s quite a bit over my head but the impression that I get is that earlier listed variables get the earliest crack at having the variability in the data assigned to them? Would that be a tolerable laymen’s interpretation? – Angela Dec 13 '15 at 18:15