Suppose one has an n × 2 matrix X (the independent variables) and a n × 1 vector y (the dependent variable). In a standard multiple linear regression setting, we solve for the 2 × 1 beta vector that minimizes the least squares objective function.
An alternative approach is to instead consider only one variable (say the first, x_1) and solve this linear regression model first. We obtain some scalar coefficient value and a set of residuals derived from the dependent variable. Then, we run a second linear regression model and regress on the residuals of the first model.
My questions are:
- What are the conceptual and technical differences in each approach?
- Are the outputs going to be equal, similar, or vastly different?
- Are there differences in the assumptions of each method? My inclination that the second method is ignorant of correlations or something to this effect, but I am not certain.
- Is there one formulation that is objectively superior to the other?
If anyone has any sources as well for further research, please feel free to share.