2

I'm not sure if this technique has a name.

I've recently learned that some people perform a two-step regression where, in the second step, they regress the residuals from the first step on some new variables. So, starting with a standard OLS:

$y = \beta_0 + \beta_1 x_1 + ... + \beta_m x_m + e$

We could now use the residuals $e$ as a dependent variable regressed on one or more new independent variables:

$e = \alpha_0 + \alpha_1 z_1 + ... + \alpha_n z_n + u$

I believe the goal here is to somehow control for $x$ in regressing on $z$, but I don't understand the benefit of this over a single regression of $y$ on $x$ and $z$ simultaneously.

When is this two-step procedure preferable to a single regression equation?

user268859
  • 21
  • 1
  • 2
    It is variously called "controlling," "matching," "leaving out," and various other things. See https://stats.stackexchange.com/a/46508/919 for one account. AFAIK, the benefits are primarily conceptual because good numerical procedures rely on various matrix decompositions (*e.g.* Cholesky) rather than this sequential approach. – whuber Dec 17 '19 at 15:34
  • 1
    AFAIK "residual regression" leads to biased estimates: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2002.00618.x – Carsten Dec 17 '19 at 16:33

0 Answers0