11

Background

In order to analyze differences in some continuous variable between different groups (given by a categorical variable), one can perform a one-way ANOVA. If there are several explanatory (categorical) variables, one can perform a factorial ANOVA. If one wants to analyze differences between groups in several continuous variables (i.e., several response variables), one has to perform a multivariate ANOVA (MANOVA).

Question

I hardly understand how one can perform an ANOVA-like test on several response variables and more importantly, I don't understand what the null hypothesis could be. Is the null hypothesis:

  • "For each response variable, the means of all groups are equal",

or is it

  • "For at least one response variable, the means of all groups are equal",

or is $H_0$ something else?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Remi.b
  • 4,572
  • 12
  • 34
  • 64
  • I can't tell, are you also asking how an ANOVA works? In the context of discussing what a standard error is, I essentially explain the basic idea behind an ANOVA here: [How does the standard error work?](http://stats.stackexchange.com/a/33627/7290) – gung - Reinstate Monica Jan 13 '15 at 21:33
  • Neither of your two statements. `H0` of MANOVA is that there is no difference in _multivariate space_. The multivariate case is considerably more complex than univariate because we have to deal with covariances, not just variances. There exist several ways to formulate the `H0-H1` hypotheses in MANOVA. Read Wikipedia. – ttnphns Jan 13 '15 at 21:38
  • @ttnphns: Why neither? The $H_0$ of ANOVA is that the means of all groups are equal. The $H_0$ of MANOVA is that the multivariate means of all groups are equal. This is exactly alternative 1 in the OP. Covariances etc. enter the *assumptions* and the *computations* of MANOVA, not the null hypothesis. – amoeba Jan 13 '15 at 21:41
  • @amoeba, I didn't like `For each response variable`. To me it sounds like (or I read it as) "testing is done univarietly on each" (and then somehow combined). – ttnphns Jan 13 '15 at 21:47

2 Answers2

12

The null hypothesis $H_0$ of a one-way ANOVA is that the means of all groups are equal: $$H_0: \mu_1 = \mu_2 = ... = \mu_k.$$ The null hypothesis $H_0$ of a one-way MANOVA is that the [multivariate] means of all groups are equal: $$H_0: \boldsymbol \mu_1 = \boldsymbol \mu_2 = ... = \boldsymbol \mu_k.$$ This is equivalent to saying that the means are equal for each response variable, i.e. your first option is correct.

In both cases the alternative hypothesis $H_1$ is the negation of the null. In both cases the assumptions are (a) Gaussian within-group distributions, and (b) equal variances (for ANOVA) / covariance matrices (for MANOVA) across groups.

Difference between MANOVA and ANOVAs

This might appear a bit confusing: the null hypothesis of MANOVA is exactly the same as the combination of null hypotheses for a collection of univariate ANOVAs, but at the same time we know that doing MANOVA is not equivalent to doing univariate ANOVAs and then somehow "combining" the results (one could come up with various ways of combining). Why not?

The answer is that running all univariate ANOVAs, even though would test the same null hypothesis, will have less power. See my answer here for an illustration: How can MANOVA report a significant difference when none of the univariate ANOVAs reaches significance? Naive method of "combining" (reject the global null if at least one ANOVA rejects the null) would also lead to a huge inflation of type I error rate; but even if one chooses some smart way of "combining" to maintain the correct error rate, one would lose in power.

How the testing works

ANOVA decomposes the total sum-of-squares $T$ into between-group sum-of-squares $B$ and within-group sum-of-squares $W$, so that $T=B+W$. It then computes the ratio $B/W$. Under the null hypothesis, this ratio should be small (around $1$); one can work out the exact distribution of this ratio expected under the null hypothesis (it will depend on $n$ and on the number of groups). Comparing the observed value $B/W$ with this distribution yields a p-value.

MANOVA decomposes the total scatter matrix $\mathbf T$ into between-group scatter matrix $\mathbf B$ and within-group scatter matrix $\mathbf W$, so that $\mathbf T = \mathbf B + \mathbf W$. It then computes the matrix $\mathbf W^{-1} \mathbf B$. Under the null hypothesis, this matrix should be "small" (around $\mathbf{I}$); but how to quantify how "small" it is? MANOVA looks at the eigenvalues $\lambda_i$ of this matrix (they are all positive). Again, under the null hypothesis, these eigenvalues should be "small" (all around $1$). But to compute a p-value, we need one number (called "statistic") in order to be able to compare it with its expected distribution under the null. There are several ways to do it: take the sum of all eigenvalues $\sum \lambda_i$; take maximal eigenvalue $\max\{\lambda_i\}$, etc. In each case, this number is compared with the distribution of this quantity expected under the null, resulting in a p-value.

Different choices of the test statistic lead to slightly different p-values, but it is important to realize that in each case the same null hypothesis is being tested.

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • Also, if you don't correct for multiple testing, the all-univariate-ANOVAs approach will yield type I error inflation as well. – gung - Reinstate Monica Jan 13 '15 at 22:20
  • 1
    @gung: Yes, that is true as well. However, one can be smarter in "combining" than just rejecting the null as soon as at least one of the ANOVAs rejects the null. My point was that however smart one tries to be in "combining", one will still lose in power as compared to MANOVA (even if one manages to maintain the size of the test without inflating the error rate). – amoeba Jan 13 '15 at 22:24
  • But isn't now that "power" directly related to the notion of the covariance? The moral is that with a (series of) univariate test we test only for marginal effect which is `SSdifference/SSerror` scalar. In MANOVA the multivariate effect is `SSCPerror^(-1)SSCPdifference` matrix (covariances total and within-groups accounted for). But since there are several eigenvalues in it which could be "combined" not in a single manner in a test statistic, several possible alternative hypotheses exist. More power - more theoretical complexity. – ttnphns Jan 14 '15 at 07:03
  • @ttnphns, yes, this is all correct, but I think does not change the fact that the null hypothesis is what I wrote it is (and that's what the question was about). Whatever test statistic is used (Wilks/Roy/Pillai-Bartlett/Lawley-Hotelling), they are trying to test the same null hypothesis. I might expand my answer later to discuss this in more detail. – amoeba Jan 14 '15 at 09:18
  • @amoeba, formally you are correct and I said nothing against. However, `H0` itself alone is futile. It is only the `H1` paired with it - which brings in sense for any test. In MANOVA, `H1` can be formulated those various ways you are listing. In ANOVA, `H1` is just one. It is an essential difference and the presence of covariational term is the reason of it. So, choosing just the word "power" as describing the difference between the two seem to be masking the problem. – ttnphns Jan 14 '15 at 10:03
  • @ttnphns: No, I disagree (or perhaps I don't understand). The alternative hypothesis, at least in the Fisher's framework to statistical testing, is simply the negation of the null. In a t-test H0 is that $\mu_1=\mu_2$ and H1 is that $\mu_1 \ne \mu_2$. In ANOVA, H0 is that $\mu_1=...=\mu_k$ and H1 is that this is not the case. In MANOVA, H0 is that $\boldsymbol \mu_1 = ... =\boldsymbol \mu_k$ and H1 is that this is not the case. – amoeba Jan 14 '15 at 12:07
  • @amoeba, I think you are mistaken both statistically and philosophically. Statement of equality with respect to smth. (which shouldn't be confused with statement of identity) cannot exist (or have sense) without formulation of its counterpart - statement of inequality. `H0` can have sense _only_ vis-a-vis a `H1`. Means not equal? _How_? or In what way? A test is _always_ that pair `H0-H1`: you reject one for the other. In MANOVA, `H1` is initially vague; when it comes to conceptualization it appears that it could be formulated technically in various ways. – ttnphns Jan 14 '15 at 12:50
  • @ttnphns: Well, I might be mistaken; but can you please provide an example and tell me what is H1 in a [two-sample unpaired] t-test, and what is H1 in a [one-way] ANOVA. Null hypotheses H0 for both these tests are equal means across groups. My claim is that H1 is the negation of H0. If you disagree, please formulate H1 in these two simple cases. – amoeba Jan 14 '15 at 13:35
  • `My claim is that H1 is the negation of H0`. Negation for the sake of what alternative? In univariate case the alternative is probably only one: there is some shift among means along that single axis. – ttnphns Jan 14 '15 at 13:53
  • @ttnphns: Negation is a negation, so I don't understand what your question means. Of course the negation of $\mu_1=\mu_2$ is that $\mu_1\ne \mu_2$, which means that "there is some shift among means along that single axis". So I take it that you agree that for t-test or ANOVA the H1 is precisely the negation of H0. Apparently you think that for MANOVA this is not the case; well, I guess then you should tell me what you think H1 is (or are), otherwise we are stuck! By the way, I have just updated my answer incorporating some of our discussion. – amoeba Jan 14 '15 at 14:09
  • @amoeba, will we stop the discussion that's getting idle? I posed no question at all (it's you who is asking me something now, as I find it). My points were: (1) A test is negation H0 for H1 (not just negation of H0); (2) in a multivariate test H1 may have several versions. – ttnphns Jan 14 '15 at 14:21
  • @ttnphns: Each of us can stop at any moment :) Incidentally, there *is* a question mark in your before-last comment; but it does not matter. Yes, I am asking: if you could either tell me these "several versions of H1", or give me a link to or a citation of where I can look them up, I would be grateful. Perhaps there is something I don't understand here and I would be happy to learn. For the moment, I remain with my belief that it is you who are mistaken. – amoeba Jan 14 '15 at 14:26
  • Roy's root is the difference (of the groups) as observed along the 1st discriminant. Hotteling's trace makes use of all the discriminants. Pillai's trace is the sum of squared canonical correlations and is homologous to R-square or Eta-square, etc. These "ways" of testing do not share exactly the same `H1`. (Eigenvalue is B/W idea, sq. canonical corr is B/T idea.) – ttnphns Jan 14 '15 at 14:47
  • @ttnphns: I don't know if this is going to convince you, but for the record, I googled [manova "alternative hypothesis"](https://www.google.com/search?q=manova+"alternative+hypothesis") and looked up the first 10 links. All of them say that H0 is that all means are equal and H1 is that at least one mean is different. I cannot find any resource that would formulate different H1 for different choices of MANOVA statistic. – amoeba Jan 14 '15 at 15:37
  • No, you haven't convinced me that I'm wrong (though of course I may be). Consider one-factor MANOVA which is, as we know, essentially the LDA. Say, there were 3 (uncorrelated) nonzero discriminant dimensions (each having its `B`,`W` and `T=B+W` SSs ). Then Hotelling is `H=B1/W1+B2/W2+B3/W3`, Pillai is `P=B1/T1+B2/T2+B3/T3`, Wilks is `L=W1/T1*W2/T2*W3/T3`. And Roy is `R=B1/W1`. Clearly, the four statistics combine group differences along the 3 dimensions into a single value in different manner. – ttnphns Jan 15 '15 at 07:09
  • (cont.) [If there were only one dimension and hence one term in each of these expressions, the three statistics would be convertible one in other and thus equivalent, but our case they are not equivalent statistics: you can’t compute, for example, the above Pillai value out of the above Hotteling value without knowing each its separate term.] It is a bit like with averaging: you could compute arithmetic mean or geometric one or even harmonic one. – ttnphns Jan 15 '15 at 07:10
  • (cont.) Pillai is the sum of the 3 Eta-squares and consequently its effect size is `P/3`. Hotelling’s effect size will be `(H/3)/(H/3+1)` and Wilks effect size `1-L^(1/3)`; these are not identical magnitudes in general. Though I may be mistaken in my conclusion but I would say the statistics represent different alternative hypotheses – different ways to measure the departure from the full equality of multivariate means. – ttnphns Jan 15 '15 at 07:10
  • @ttnphns, yes, I fully agree with **everything** you wrote in the last three comments (and your knowledge of different MANOVA statistics is far superior to mine). These four statistics measure different things, are not redundant, and will generally result in different p-values. The only point I am repeatedly making here, is that **nevertheless** these four different tests aim to test the same $H_0$ vs. the same $H_1=\lnot H_0$. They just do it in different ways. I think we should ask somebody with good understanding of classical statistics to judge us here. I will try to ping gung in chat. – amoeba Jan 15 '15 at 10:34
  • @amoeba, I propose you to sign the treaty that the generic H1 is one as you state, but its operational realizations may go further several ways. :-) – ttnphns Jan 15 '15 at 16:47
  • @ttnphns, I am happy to sign a treaty :) I did not reply earlier, because I was thinking if maybe the difference between our points of view is connected to the difference between Fisher and Neyman-Pearson frameworks of hypothesis testing... But I feel it's too much of an aside here. Happy to agree about "operational realizations", even if it is not a well-defined statistical term :) I certainly do agree that rejection with different tests (e.g. rejection with Hotelling, but no rejection with Roy) can tell us something meaningful about the dataset! Which I guess was your point all along. – amoeba Jan 16 '15 at 21:07
  • 1
    @gung asked me to chime in (not sure why... I taught MANOVA some 7 years ago, and never applied it) -- I would say that amoeba is right in saying that $H_1$ is a full negation of the null $H_0: \mu_{\mbox{group }1} = \ldots = \mu_{\mbox{group }k}$, which is a $p$-dimensional hyperspace in $kp$ dimensional space of parameters (if $p$ is the dimension that nobody bothered defining so far). And it is option 1 given by the OP. Option 2 is significantly more difficult to test. – StasK Jan 20 '15 at 04:28
8

It is the former.

However, the way it does it isn't literally to compare the means of each of the original variables in turn. Instead the response variables are linearly transformed in a way that is very similar to principal components analysis. (There is an excellent thread on PCA here: Making sense of principal component analysis, eigenvectors & eigenvalues.) The difference is that PCA orients your axes so as to align with the directions of maximal variation, whereas MANOVA rotates your axes in the directions that maximize the separation of your groups.

To be clear though, none of the tests associated with a MANOVA is testing all the means one after another in a direct sense, either with the means in the original space or in the transformed space. There are several different test statistics that each work in a slightly different way, nonetheless they tend to operate over the eigenvalues of the decomposition that transforms the space. But as far as the nature of the null hypothesis goes, it is that all means of all groups are the same on each response variable, not that they can differ on some variables but are the same on at least one.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Ooh...So Manova makes a linear discriminant analysis (to maximize the distance between the mean of the groups) and then, it runs a standard anova using the first axis as response variable? So, $Ho$ is "the means - in term of PC1 - of all groups are the same". Is that right? – Remi.b Jan 13 '15 at 21:19
  • There are several different possible tests. Testing only the the 1st axis is essentially using Roy's largest root as your test. This will often be the most powerful test, but it is also more limited. I gather there is ongoing discussion over which test is 'best'. – gung - Reinstate Monica Jan 13 '15 at 21:23
  • I guess we use MANOVA rather than several ANOVAs in order to avoid multiple testing issues. But if, by doing an MANOVA we just make an ANOVA on PC1 of a [LDR](http://en.wikipedia.org/wiki/Linear_discriminant_analysis), then we still have a multiple testing issue to consider when looking at the Pvalue. Is this right? (Hope that makes more sense. I deleted my previous unclear comment) – Remi.b Jan 13 '15 at 21:28
  • That's an insightful point, but there are two issues: 1) the axes are now orthogonal, & that can change the issues w/ multiple testing; 2) the sampling distributions of the MANOVA test statistics take the multiple axes into account. – gung - Reinstate Monica Jan 13 '15 at 21:32
  • 1
    @Remi.b: These are good questions, but just to be clear: MANOVA is *not* equivalent to a ANOVA on the first discriminant axis of LDA! See here for a the relation between MANOVA and LDA: [How is MANOVA related to LDA?](http://stats.stackexchange.com/questions/82959) – amoeba Jan 13 '15 at 21:34
  • @amoeba, thanks for the link (& sorry I hadn't seen it before). I just skimmed it, but I'll have to sit down w/ it more thoroughly when I have more time (like your other answer). I wonder if I am being too 'hand-wavy' / misleading here. (I am aware that you aren't literally doing an ANOVA on the discriminants.) – gung - Reinstate Monica Jan 13 '15 at 21:47
  • @gung: Thanks! I am actually not entirely happy with that answer anymore; if I had to write it again I would do it a bit differently. Nevertheless, it contains some points that seem to be very relevant for OP. Regarding your answer: I think you can take the "more" out of "It is more the former", and it will become more correct :) Indeed, if the means of linearly transformed variable are the same, then the means of the original variables are also the same, aren't they? So this reservation: "However, it isn't the means of the original variables exactly" is somewhat misleading. – amoeba Jan 13 '15 at 21:51
  • While a simultaneous test of several null hypotheses like "group means are the same for variable 1", "... variable 2," ... "... variable $p$", testing *infinitely many* null hypotheses $H_{0,{\bf a,c}} = \sum_{j=1}^k c_j {\bf a}'\mathbf{\mu_j}$ over all contrasts between groups $\bf c$ and all combinations of dimensions $\bf a$ turns out to be feasible via union-intersection principle: you find an appropriate maximum for the worst case scenario, and if it is bad enough, you reject. That's Roy test of the largest eigenvalue that @ttnphns mentioned in another thread. See Mardia, Kent & Bibby. – StasK Jan 20 '15 at 04:36
  • @amoeba, I just tweaked my answer in accordance with your suggestion. (I can't figure out why I hadn't done that already.) See if you think it's better now, or if it still needs something else. – gung - Reinstate Monica Feb 09 '17 at 19:36
  • +1. It's been some time since I looked at this thread. I think your answer is fine now, and I just noticed that I have to fix some errors in mine :-/ – amoeba Feb 09 '17 at 21:56