Confusion regarding "regression by successive orthogonalization"

Question

In trying to answer a question here on Cross Validated, I was re-reading Section 3.2.3, specifically Algorithm 3.1 from Elements of Statistical Learning.

What I followed from this is that, given a model with one dependent variable and two independent variables,

$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon$

then the estimated regression coefficients, let's say for example $\hat{\beta_1}$ in the equation above, would be the same $\hat{\beta_1}$ from this equation:

$z_2 = \beta_0 + \beta_1X_1 + \epsilon$,

where

$z_2 = Y - \beta_0 - \beta_2X_2$

That is: It is my understanding that the regression coefficient $\hat{\beta_j}$ one gets in a multivariate linear regression is equal to the coefficient you would get if you took the residuals from the model where the dependent variable is regressed on all other predictor variables (besides $X_j$) and regressed those on $X_j$. However, I simulated some data and did not get this:

set.seed(1839) # setting seed
x1 <- rnorm(200, 0, 1) # generating x1
x2 <- x1 + rnorm(200, 1, 3) # generating a correlated x2
eps <- rnorm(200, 0, 6) # generating error
y <- x1 + x2 + eps # making y
fit <- lm(y ~ x1 + x2) # fitting overall model

Looking at the summary, we can see the coefficients for x1 and x2:

summary(fit) # looking at summary

    Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.4923  -4.4054   0.2954   4.0371  15.0697 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.5802     0.4462  -1.300    0.195    
x1            0.1487     0.4733   0.314    0.754    
x2            1.1272     0.1397   8.071 6.77e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.021 on 197 degrees of freedom
Multiple R-squared:  0.2706,    Adjusted R-squared:  0.2631 
F-statistic: 36.53 on 2 and 197 DF,  p-value: 3.199e-14

Now, why don't these coefficient's match these found in the equations below?

coef(lm(residuals(lm(y ~ x2)) ~ x1))[2] # not exactly equal to the x1 in the original fit

       x1 
0.1358402 

coef(lm(residuals(lm(y ~ x1)) ~ x2))[2] # not exactly equal to the x2 in the original fit

      x2 
1.029427

Why are these x1 and x2 coefficients not the same as those above? They are close—is this due to rounding? Or am I missing something from Algorithm 3.1?

Your code does not seem to follow the procedure described in the text. It requires that you take out the effects of `x1` from *all* variables, not just the dependent variable, and then proceed recursively. I believe this point is made clear in the duplicates, which include some working `R` code to produce comparable simulations and test the procedure against them. — whuber, Jun 29 '17 at 21:28
Bingo, thanks! I knew I was making some simple oversight. Thanks for showing me the duplicates; I wasn't able to find any using the key words I was thinking of. — Mark White, Jun 30 '17 at 13:17

Confusion regarding "regression by successive orthogonalization"

0 Answers0