2

In linear regression we can do something like this

$$ y' = resid(y \sim \beta_0 + \beta_1 c1 + \beta_2 c2 ) $$

where $c1, c2$ are covariates, and then fit $y'$ in another model:

$$ y' \sim b_0 + b_1 x_1 $$

The advantage of this approach is when you have a lot of independent variables like $x_1, x_2, \cdots x_n$ (for example, a large number of DNA mutation sites), and they all share the same covariates (e.g. age, sex, weight etc), you can simplify things and reduce computation.

But it looks like this will only work with continous variables, like height, blood pressure, etc, what if I have a binary variable, such as tumor/no-tumor, infected/uninfected etc, is it still possible to do something similar?

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
qed
  • 2,508
  • 3
  • 21
  • 33
  • 2
    In the case of ordinary regression this sequential approach (when correctly carried out, which regresses $y'$ on the *residuals* of $x_1$ rather than $x_1$ itself) is *identical* to multiple regression (see the explanation at http://stats.stackexchange.com/a/46508). Thus it is questionable whether there really is any reduction in computation. You might as well do logistic regression (or any GLM) with all the variables at the outset. That ought to yield--perhaps with appropriate coaxing--all the information, and the same information, you seek to obtain through the sequential method. – whuber Nov 21 '13 at 17:18
  • 1
    @whuber That looks to me to be a fair approximation of a correct and highly useful answer to the question. It may be concise, but there's a lot of value packed into it. I think you could consider posting it as an answer, even as-is. – Glen_b Nov 21 '13 at 22:31
  • 1
    The title explicitly refers to a logistic regression framework, while the question itself is about categorical variables -which of course are extensively used in linear regression also. So modify your title, or your question, please. – Alecos Papadopoulos Nov 22 '13 at 01:13

0 Answers0