Difference between Multivariate Regression vs Iterative Regression on Residuals

Question

Suppose one has an n × 2 matrix X (the independent variables) and a n × 1 vector y (the dependent variable). In a standard multiple linear regression setting, we solve for the 2 × 1 beta vector that minimizes the least squares objective function.

An alternative approach is to instead consider only one variable (say the first, x_1) and solve this linear regression model first. We obtain some scalar coefficient value and a set of residuals derived from the dependent variable. Then, we run a second linear regression model and regress on the residuals of the first model.

My questions are:

What are the conceptual and technical differences in each approach?
Are the outputs going to be equal, similar, or vastly different?
Are there differences in the assumptions of each method? My inclination that the second method is ignorant of correlations or something to this effect, but I am not certain.
Is there one formulation that is objectively superior to the other?

If anyone has any sources as well for further research, please feel free to share.

Welcome to the site Anon. I think there is a confusion in terms. Multivariate analysis is when you have multiple response (a.k.a. dependent variables). While univariate analyses have one response but one or more covariates (a.k.a. independent variables). I have amended the question as such and will answer it in a minute. — André.B, Oct 15 '19 at 03:13

score 1 · Answer 1 · answered Oct 15 '19 at 03:52

1) The two approaches actually work in the same way, albeit in the first (multiple covariates) case the system of linear equations that are being solved is greater.

2) The outputs would be different but how different would depend on the correlation. The model with more covariates will invariably explain more variation, although this is not to say that it is explaining anything meaningful. If the two variables are highly correlated then the resulting multiple linear regression model will be fine for prediction but you won't be good for inference as the two will be trying to explain the same variation. If one of your covariates doesn't explain much variation and the other does, then the results will be more or less equivalent to two separate tests.

3) Assumptions are the same between the two (i.e. normality of errors, equality of variance and independence of observations).

4) Which is superior depends entirely on the question you wish to ask and it is hard to say more than that without writing a rather lengthy essay...

Difference between Multivariate Regression vs Iterative Regression on Residuals

1 Answers1