Hierarchical and sequential regression

Question

I performed a hierarchical multiple regression to study the effect of different variables, above and beyond the apriori selected more important variables. The R script used to perform this regression is listed below.

Data <- read.csv("~/Desktop/data.csv", header=T,quote="")
Data$catVar.f <- factor(HMRdata$catVar)
HRModel1 <- lm(responseV ~ catVar.f, Data)
HRModel2 <- lm(responseV ~ catVar.f + v1, Data)
HRModel3 <- lm(responseV ~ catVar.f + v1+ v2, Data)
HRModel4 <- lm(responseV ~ catVar.f + v1 + v2 + v3, Data)
summary (HRModel1)
summary (HRModel2)
summary (HRModel3)
summary (HRModel4)
anova (HRModel1, HRModel2, HRModel3, HRModel4)

My supervisor thought that the collinearity within the variables may pose a problem, and so asked to me perform a similar method called sequential regression, in which, instead of adding the new variables in the later regression analyses, I added residuals of these variables regressed against the previous, more important variables. The R script for this regression method is listed below –

RModel1 <- lm(v1 ~ catVar.f, Data, na.action = na.exclude)
v1residuals <- resid(RModel1)
RModel2 <- lm(v2 ~ catVar.f + v1residuals, Data, na.action = na.exclude)
v2residuals <- resid(RModel2)
RModel3 <- lm(v3 ~ catVar.f + v1residuals + v2residuals, Data, na.action = na.exclude)
v3residuals <- resid(RModel3)
SRModel1 <- lm(responseV ~ catVar.f, Data)
SRModel2 <- lm(responseV ~ catVar.f + v1residuals, Data)
SRModel3 <- lm(responseV ~ catVar.f + v1residuals + v2residuals, Data)
SRModel4 <- lm(responseV ~ catVar.f + v1residuals + v2residuals + v3residuals, Data)
summary (SRModel1)
summary (SRModel2)
summary (SRModel3)
summary (SRModel4)
anova (SRModel1, SRModel2, SRModel3, SRModel4)

To my surprise, the outputs (for summary and anova functions) of both the regression methods are exactly the same. Since I do not know the statistical principals behind these methods, I cannot figure out the reason for the same outputs. Can somebody please explain why both methods gave the same outputs? Also, is there any mistake in my approach?

EDIT - outputs from anova

Hierarchical Regression output

Model 1: responseV ~ catVar.f
Model 2: responseV ~ catVar.f + v1  
Model 3: responseV ~ catVar.f + v1 + v2
Model 4: responseV ~ catVar.f + v1 + v2 + v3 + v4
Model 5: responseV ~ catVar.f + v1 + v2 + v3 + v4 + v5 + v6 +v7 + v8        

Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
1     53 7324.2
2     52 2663.8  1    4660.3 111.3874 9.347e-14 ***
3     51 2057.7  1     606.1  14.4874 0.0004238 ***
4     49 1968.2  2      89.5   1.0699 0.3516141
5     45 1882.8  4      85.4   0.5103 0.7283774

Sequential regression output

Model 1: responseV ~ catVar.f
Model 2: responseV ~ catVar.f + v1  residuals
Model 3: responseV ~ catVar.f + v1residuals + v2residuals
Model 4: responseV ~ catVar.f + v1residuals + v2residuals + v3residuals + v4residuals
Model 5: responseV ~ catVar.f + v1residuals + v2residuals + v3residuals + v4residuals + v5residuals + v6residuals +v7residuals + v8residuals

  Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
1     53 7324.2 
2     52 2663.8  1    4660.3 111.3874 9.347e-14 ***
3     51 2057.7  1     606.1  14.4874 0.0004238 ***
4     49 1968.2  2      89.5   1.0699 0.3516141
5     45 1882.8  4      85.4   0.5103 0.7283774

I have added the outputs for the anova functions. The discrepancy between script and output (i.e. 4 vs 5 models, and less variables in script than output) is because I simplified the script before posting here, but the principle is the same. — gsd, Dec 15 '16 at 21:01
You should not be surprised! Your Q is answered here: https://stats.stackexchange.com/questions/46185/question-on-how-to-normalize-regression-coefficient/46508#46508 (maybe a duplicate?) — kjetil b halvorsen, Sep 08 '17 at 14:25

Hierarchical and sequential regression

0 Answers0