21

Is recasting a multivariate linear regression model as a multiple linear regression entirely equivalent? I'm not referring to simply running $t$ separate regressions.

I have read this in a few places (Bayesian Data Analysis -- Gelman et al., and Multivariate Old School -- Marden) that a multivariate linear model can easily be reparameterized as multiple regression. However, neither source elaborates on this at all. They essentially just mention it, then continue using the multivariate model. Mathematically, I'll write the multivariate version first,

$$ \underset{n \times t}{\mathbf{Y}} = \underset{n \times k}{\mathbf{X}} \hspace{2mm}\underset{k \times t}{\mathbf{B}} + \underset{n \times t}{\mathbf{R}}, $$ where the bold variables are matrices with their sizes below them. As usual, $\mathbf{Y}$ is data, $\mathbf{X}$ is the design matrix, $\mathbf{R}$ are normally distributed residuals, and $\mathbf{B}$ is what we are interested in making inferences with.

To reparameterize this as the familiar multiple linear regression, one simply rewrites the variables as:

$$ \underset{nt \times 1}{\mathbf{y}} = \underset{nt \times nk}{\mathbf{D}} \hspace{2mm} \underset{nk \times 1}{\boldsymbol{\beta}} + \underset{nt \times 1}{\mathbf{r}}, $$

where the reparameterizations used are $\mathbf{y} = row(\mathbf{Y}) $, $\boldsymbol\beta = row(\mathbf{B})$, and $\mathbf{D} = \mathbf{X} \otimes \mathbf{I}_{n}$. $row()$ means that the rows of the matrix are arranged end to end into a long vector, and $\otimes$ is the kronecker, or outer, product.

So, if this is so easy, why bother writing books on multivariate models, test statistics for them etc.? It is most effective to just transform the variables first and use common univariate techniques. I'm sure there is a good reason, I just am having a hard time thinking of one, at least in the case of a linear model. Are there situations with the multivariate linear model and normally distributed random errors where this reparameterization does not apply, or limits the possibilities of the analysis you can undertake?

Sources I have seen this: Marden - Multivariate Statistics: Old School. See sections 5.3 - 5.5. The book is available free from: http://istics.net/stat/

Gelman et al. - Bayesian Data Analysis. I have the second edition, and in this version there is a small paragraph in Ch. 19 'Multivariate Regression Models' titled: "The equivalent univariate regression model"

Basically, can you do everything with the equivalent linear univariate regression model that you could with the multivariate model? If so, why develop methods for multivariate linear models at all?

What about with Bayesian approaches?

bill_e
  • 2,681
  • 1
  • 19
  • 33
  • It is good question. May be you could ask for more in terms of foundations rather than a structure. –  Dec 22 '13 at 10:37
  • 1
    What do you mean by foundations rather than structure? Could you elaborate? – bill_e Dec 22 '13 at 10:51
  • May note that I learnt only two papers as part of my first and postgraduate degree long back, I do not have grooming in technical descriptions. I understand that Multivariate analysis has different assumptions when compared with a multiple linear regression or simply linear regression model. The assumptions for Multivariate analysis are different i.e. mathematical expectation prevails upon. multiple linear regression makes certain other assumptions that result in heteroscedatisticity. The structure here I mean refers to your equations. –  Dec 22 '13 at 13:35
  • You should say it clearly in the title or the beginning whether you are speaking of _multivariate (general) linear model_ or about _bayesian multivariate regression_. – ttnphns Dec 22 '13 at 17:11
  • Oh.. I am really just talking about the model, and not so much the approach. From my understanding, for a GLM you use point estimates which themselves have a distribution. For a Bayesian multivariate regression, your solution is a distribution on $\mathbf{B}$. I don't think this interpretation matters. – bill_e Dec 22 '13 at 17:33
  • I didn't read your question attentively, so my comment may be a miss. But does your "reparameterization" adequately accounts for the fact that the DVs covariate? Multivarite GLM and multiple linear regression are equivalent only in case of several variables vs one variable. Then R-square = Pillai's trace. – ttnphns Dec 22 '13 at 17:49
  • Yes, it does account for correlations in the DVs. This is what is confusing to me. It is easy to verify in R or octave or whatever that if you make fake data/design matrix and then reparameterize as shown in the question. – bill_e Dec 22 '13 at 18:04
  • Regardles of whether your approach is correct or not (I didn't check it, sorry) you should be aware that Multivariate GLM (such as MANOVA) is computentionally more efficient, faster than the series or reparameterized univariate multiple regressions. Also, your claim about that F is "exact" while multivariate tests are "approximations" seem to be not true. – ttnphns Dec 22 '13 at 19:34
  • 1
    Ok, so.. its not *my* approach, I pointed out two places I have seen this. The approach is the crux of the issue. What is the difference between the multivariate version and the reparameterized univariate version? – bill_e Dec 22 '13 at 20:03
  • All you have done is change the notation; you haven't changed the model one whit. (You can recover the original formulation from the rewritten version. It's not even a reparameterization: the parameters are in one-to-one correspondence and all you have done is to re-index them.) Since changing the names of terms in a model is considered to be inconsequential, *of course* these models are equivalent. But try writing some natural multivariate models in univariate form and you'll quickly see how much more expressive the multivariate notation can be. – whuber Dec 24 '13 at 20:07
  • Yes, maybe reindex is a better word than reparameterize. I guess I'm just having a having a hard time understanding why the univariate form is less expressive. I've never seen things like the Wilks and Pillais trace statistics used in a univariate setting, even though the models can be equivalent. Why is this? – bill_e Dec 24 '13 at 21:31

2 Answers2

6

Basically, can you do everything with the equivalent linear univariate regression model that you could with the multivariate model?

I believe the answer is no.

If your goal is simply either to estimate the effects (parameters in $\mathbf{B}$) or to further make predictions based on the model, then yes it does not matter to adopt which model formulation between the two.

However, to make statistical inferences especially to perform the classical significance testing, the multivariate formulation seems practically irreplaceable. More specifically let me use the typical data analysis in psychology as an example. The data from $n$ subjects are expressed as

$$ \underset{n \times t}{\mathbf{Y}} = \underset{n \times k}{\mathbf{X}} \hspace{2mm}\underset{k \times t}{\mathbf{B}} + \underset{n \times t}{\mathbf{R}}, $$

where the $k-1$ between-subjects explanatory variables (factor or/and quantitative covariates) are coded as the columns in $\mathbf{X}$ while the $t$ repeated-measures (or within-subject) factor levels are represented as simultaneous variables or the columns in $\mathbf{Y}$.

With the above formulation, any general linear hypothesis can be easily expressed as

$$\mathbf{L} \mathbf{B} \mathbf{M} = \mathbf{C},$$

where $\mathbf{L}$ is composed of the weights among the between-subjects explanatory variables while $\mathbf{L}$ contains the weights among levels of the repeated-measures factors, and $\mathbf{C}$ is a constant matrix, usually $\mathbf{0}$.

The beauty of the multivariate system lies in its separation between the two types of variables, between- and within-subject. It is this separation that allows for the easy formulation for three types of significance testing under the multivariate framework: the classical multivariate testing, repeated-measures multivariate testing, and repeated-measures univariate testing. Furthermore, Mauchly testing for sphericity violation and the corresponding correction methods (Greenhouse-Geisser and Huynh-Feldt) also become natural for univariate testing in the multivariate system. This is exactly how the statistical packages implemented those tests such as car in R, GLM in IBM SPSS Statistics, and REPEATED statement in PROC GLM of SAS.

I'm not so sure whether the formulation matters in Bayesian data analysis, but I doubt the above testing capability could be formulated and implemented under the univariate platform.

bluepole
  • 2,376
  • 3
  • 25
  • 34
  • I see, this makes sense. Thank you for the great answer. I'd love to hear a Bayesian perspective too. – bill_e Dec 25 '13 at 05:37
  • @PeterRabbit If you like the answer, please express your gratitude to bluepole by accepting his answer. He'll get points. – pteetor Dec 26 '13 at 22:58
  • I will, I was just holding out a bit to see if anyone would offer a Bayesians perspective though. – bill_e Dec 27 '13 at 19:22
4

Both models are equivalent if you fit appropriate variance-covariance structure. In transformed linear model we need to fit variance-covariance matrix of error component with kronecker product which has limited availability in available computing softwares. Linear Model Theory-Univariate, Multivariate, and Mixed Models is excellent reference for this topic.

Edited

Here is another nice reference freely available.

MYaseen208
  • 2,379
  • 7
  • 32
  • 46
  • 2
    Oh ok, so in a normal univariate model, there is no type of covariance structure "within" the DVs. Therefore hypothesis tests concerned with that don't exist. Thank you! I'll see if I can pick up that book. – bill_e Dec 25 '13 at 08:14