I'd like help considering two different strategies in performing my regression.
Strategy 1:
model the output y
as a linear combination of input variables x1
through x3
:
fit = lm(y ~ x1 + x2 + x3)
Strategy 2:
model the output y
with a linear relationship single input variable x1
, and then model the residual of that through x2
etc:
firstFit = lm(y ~ x1)
firstResidual = y - predict(firstFit)
secondFit <- lm(firstResidual ~ x2 + 0) # Only one intercept
secondResidual = firstResidual - predict(secondFit)
thirdFit <- lm(secondResidual ~ x3 + 0) # Only one intercept
Questions:
- What are the differences between these two strategies?
- Can I expect that the "quality" of the coefficients in Strategy 1 to be uniform across
X
, and in Strategy 2 to not be? - Strategy 1 seems generally the way to go, what would be a circumstance where Strategy 2 would be better?
- If I believe that the "truth" is that
y
is the output of a nested set of functions:y = f1(x1, f2(x2, f3(x3)))
,
would Strategy 2 be an appropriate way to model the system?