8

Would anyone be willing to venture an intuitive description of the situations under which a multivariate response model is more appropriate than many linear regressions?

As an example, take a randomly allocated agricultural extension program, and yields of several different crops grown by farmers. You could run several different models for each crop. Or you could aggregate the crops somehow. Or maybe you could run a multivariate response model, whereby your dependent variable is actually a matrix rather than a vector.

I've been reading up on the math of it all, but I haven't found a good intuitive description of the situations where these sorts of models are the most useful, nor their practical pitfalls. I get that the errors will be correlated between responses. Does this mean that you'd get more power in a situation where individual regressions would be underpowered? Is there any reason why coefficient matrices estimated in these models wouldn't have a causal interpretation if a variable is randomly allocated?

generic_user
  • 11,981
  • 8
  • 40
  • 63
  • I've posted my comment and response from a moment ago` as an answer, but I am not sure whether the update answers your question or not. Can you take a look and maybe clarify your followup question if it doesn't? – Glen_b Jun 07 '13 at 07:03

2 Answers2

8

I think my comments have grown long enough for an answer...

One reason why you might want to look at the multivariate case rather than univariate cases is when there's a lot of dependence between variables. It's quite possible for each univariate response to show "no effect" but the multivariate one to show a strong one. See this plot about a difference between two groups on just two dimensions

Note that here, $y$ and $x$ are both DVs, and the grouping variable (red/black indicator) is the (lone) IV in the 'regression'.

two groups, two dependent DVs

The issue is that the thing whose mean really differs between the two groups is not the variable $X$ or the variable $Y$ (that is, $\mu_{X2}-\mu_{X1}$ is almost zero, same for $Y$), but a particular linear combination - in the example, $Y-X$ - on which the means of the two groups strongly differ.

In that case univariate $t$ tests find nothing but a multivariate test sees it easily (which can be done by regression and multivariate regression where there is a single IV, the group indicator).

The same issue applies to other, less simple regressions.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • I think that makes sense. You use the correlations between DV's to be more certain about groupwise differences in independent variables (which can be generalized to more complicated models). But the coefficients themselves should be identical. – generic_user Jun 07 '13 at 07:20
  • But that leaves me with the question: are there any practical pitfalls? Difference in interpretation of coefficients that are important? I don't immediately see any... – generic_user Jun 07 '13 at 07:26
  • You still seem to have at least some confusion. You already *know* the IVs (the predictor) is different (if black is '0' and red is '1', you already know 0 and 1 differ - you chose them that way!). It's the response (the two axes in my plot are both response variables, DVs) and its *those* that you want to find effects for (in this simple example find differences in their means by regression on the 0-1 variable). It occurs to me that you may be alluding to *[seemingly unrelated regressions](http://en.wikipedia.org/wiki/Seemingly_unrelated_regressions)*; is that the case? – Glen_b Jun 07 '13 at 07:31
  • I said independent, I meant dependent. – generic_user Jun 07 '13 at 07:33
  • Oh, okay, sorry. I'm going to come back and look at this another day. – Glen_b Jun 07 '13 at 07:37
  • Had forgotten about SUR, but it looks different from multivariate response models -- here I'm talking about the same regressors, multiple outcomes, and estimation by OLS. – generic_user Jun 07 '13 at 07:38
  • Anyway, appreciate your response! – generic_user Jun 07 '13 at 07:38
  • @ACD, re coefficients: multivariate regression is exactly equivalent to a set of univariate regressions; you can simply run univariate regressions for each DV and then stack the coefficients in one matrix (as you said: "coefficients should be identical"). So how can there be any "difference in interpretation of coefficients" if they are exactly the same? I am not sure what you were asking here. – amoeba Dec 23 '14 at 16:36
  • A quick question on: (that is, $\mu_{X2}-\mu_{X1}$ is almost zero, same for $Y$). How is that so? On the y-dimension it looks like the black dots have bigger mean that the red ones. – papgeo Mar 02 '19 at 19:19
  • 1
    They do have a bigger mean but the difference in population means very small compared to the size of the standard deviations in each variable. The average of the pair-differences is relatively much larger compared to the standard deviation by comparison. That is, there's a linear combination of the original variables on which there's a strong group difference effect even though the marginal effects are small. – Glen_b Mar 02 '19 at 23:42
0

Your question seems to relate to "multiple" regression (one with a single outcome and multiple continuous predictors), NOT "multivariate" (multiple outcomes, a single predictor)

Mihai
  • 21
  • 2