I performed a hierarchical multiple regression to study the effect of different variables, above and beyond the apriori selected more important variables. The R script used to perform this regression is listed below.
Data <- read.csv("~/Desktop/data.csv", header=T,quote="")
Data$catVar.f <- factor(HMRdata$catVar)
HRModel1 <- lm(responseV ~ catVar.f, Data)
HRModel2 <- lm(responseV ~ catVar.f + v1, Data)
HRModel3 <- lm(responseV ~ catVar.f + v1+ v2, Data)
HRModel4 <- lm(responseV ~ catVar.f + v1 + v2 + v3, Data)
summary (HRModel1)
summary (HRModel2)
summary (HRModel3)
summary (HRModel4)
anova (HRModel1, HRModel2, HRModel3, HRModel4)
My supervisor thought that the collinearity within the variables may pose a problem, and so asked to me perform a similar method called sequential regression, in which, instead of adding the new variables in the later regression analyses, I added residuals of these variables regressed against the previous, more important variables. The R script for this regression method is listed below –
RModel1 <- lm(v1 ~ catVar.f, Data, na.action = na.exclude)
v1residuals <- resid(RModel1)
RModel2 <- lm(v2 ~ catVar.f + v1residuals, Data, na.action = na.exclude)
v2residuals <- resid(RModel2)
RModel3 <- lm(v3 ~ catVar.f + v1residuals + v2residuals, Data, na.action = na.exclude)
v3residuals <- resid(RModel3)
SRModel1 <- lm(responseV ~ catVar.f, Data)
SRModel2 <- lm(responseV ~ catVar.f + v1residuals, Data)
SRModel3 <- lm(responseV ~ catVar.f + v1residuals + v2residuals, Data)
SRModel4 <- lm(responseV ~ catVar.f + v1residuals + v2residuals + v3residuals, Data)
summary (SRModel1)
summary (SRModel2)
summary (SRModel3)
summary (SRModel4)
anova (SRModel1, SRModel2, SRModel3, SRModel4)
To my surprise, the outputs (for summary and anova functions) of both the regression methods are exactly the same. Since I do not know the statistical principals behind these methods, I cannot figure out the reason for the same outputs. Can somebody please explain why both methods gave the same outputs? Also, is there any mistake in my approach?
EDIT - outputs from anova
Hierarchical Regression output
Model 1: responseV ~ catVar.f
Model 2: responseV ~ catVar.f + v1
Model 3: responseV ~ catVar.f + v1 + v2
Model 4: responseV ~ catVar.f + v1 + v2 + v3 + v4
Model 5: responseV ~ catVar.f + v1 + v2 + v3 + v4 + v5 + v6 +v7 + v8
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 7324.2
2 52 2663.8 1 4660.3 111.3874 9.347e-14 ***
3 51 2057.7 1 606.1 14.4874 0.0004238 ***
4 49 1968.2 2 89.5 1.0699 0.3516141
5 45 1882.8 4 85.4 0.5103 0.7283774
Sequential regression output
Model 1: responseV ~ catVar.f
Model 2: responseV ~ catVar.f + v1 residuals
Model 3: responseV ~ catVar.f + v1residuals + v2residuals
Model 4: responseV ~ catVar.f + v1residuals + v2residuals + v3residuals + v4residuals
Model 5: responseV ~ catVar.f + v1residuals + v2residuals + v3residuals + v4residuals + v5residuals + v6residuals +v7residuals + v8residuals
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 7324.2
2 52 2663.8 1 4660.3 111.3874 9.347e-14 ***
3 51 2057.7 1 606.1 14.4874 0.0004238 ***
4 49 1968.2 2 89.5 1.0699 0.3516141
5 45 1882.8 4 85.4 0.5103 0.7283774