This is my first question, please should I write something wrong correct me.
I have a question when comparing two GLMs after applying stepwise selection. What I've always heard is that stepwise selection is not robust enough for the selected variables. Therefore, I use ANOVA to compare the most parsimonious model (lowest AIC) and the model without the least significant variable. Accordingly, if the ANOVA outcome is nonsignificant I remove the variable from the stepwise selection. It is said that the simple models (few predictors) dominated over difficult model (a lot of predictors).
My problem is that when I try to do it with R. The p-value from the ANOVA is the same as the p-value of the stepwise selection. Do you think that I'm doing something wrong?
Here is my code:
I first run the command stepAIC
and obtain the most parsimonious model:
Fullmodel<-glm(Geommean~log_CPUEig+log_CPUE+max_depth+amp_temperature+TotalP+Elevation+max_temperature+lake_area)
stepwise <- stepAIC(fullmodel)
# Step: AIC=1280.2 Geommean ~ log_CPUE + max_depth + max_temperature
#
# Df Deviance AIC
# <none> 1474.2 1280.2
# - max_depth 1 1485.6 1280.4
# - max_temperature 1 1495.8 1282.3
# - log_CPUE 1 1937.1 1355.5
Then, I look the p-value for each selected variables from the stepwise procedure:
summary(stepwise)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 25.3323 3.5104 7.216 5.06e-12 ***
# log_CPUE -3.5670 0.3811 -9.359 < 2e-16 ***
# max_depth -0.5818 0.3959 -1.469 0.1428
# max_temperature -6.1182 3.0290 -2.020 0.0443 *
I choose the max_depth
as the least significant variable.
I run ANOVA to compare the most parsimonious model and the model without the least significant variable.
# Most parsimonious model:
parsimonious <- glm(Geommean ~ log_CPUE+max_temperature+max_depth)
# Model without the least significant variable:
variable <- glm(Geommean ~ log_CPUE+max_temperature)
I compare these two models with ANOVA:
compare <- anova(parsimonious,variable, test="F")
compare
# Resid. Df Resid. Dev Df Deviance F Pr(>F)
# 1 279 1474.2
# 280 1485.6 -1 -11.41 2.1593 **0.1428**
After doing ANOVA, I obtain the same p-value of the least significant variable (0.1428). In that case, I removed max_depth from the most parsimonoius model because the p-values is greater than 0.05. Do you think that this outcome is normal?
Thank you very much for your help.