I'm trying to figure out why the anova
function in R gives me the same results (for the p-value) regardless of the order of the models.
> anova(lm.fit ,lm.fit2)
Analysis of Variance Table
Model 1: medv ~ lstat
Model 2: medv ~ lstat + I(lstat^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 504 19472
2 503 15347 1 4125.1 135.2 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(lm.fit2,lm.fit)
Analysis of Variance Table
Model 1: medv ~ lstat + I(lstat^2)
Model 2: medv ~ lstat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 503 15347
2 504 19472 -1 -4125.1 135.2 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I can't understand why the p-value is so low in both cases. The way I'm understanding I should interpret the result of the anova
is that the model 2 is better than model 1 if the p-value is very low, but in this case I'm getting exactly the same no matter the order.
I'm trying to read ?anova
to check what this all means, but the help page is very succinct, is there another help where it states what the Df
parameter means for instance?