Same p-value when comparing two GLM

Question

This is my first question, please should I write something wrong correct me.

I have a question when comparing two GLMs after applying stepwise selection. What I've always heard is that stepwise selection is not robust enough for the selected variables. Therefore, I use ANOVA to compare the most parsimonious model (lowest AIC) and the model without the least significant variable. Accordingly, if the ANOVA outcome is nonsignificant I remove the variable from the stepwise selection. It is said that the simple models (few predictors) dominated over difficult model (a lot of predictors).

My problem is that when I try to do it with R. The p-value from the ANOVA is the same as the p-value of the stepwise selection. Do you think that I'm doing something wrong?

Here is my code:

I first run the command stepAIC and obtain the most parsimonious model:

Fullmodel<-glm(Geommean~log_CPUEig+log_CPUE+max_depth+amp_temperature+TotalP+Elevation+max_temperature+lake_area)

stepwise <- stepAIC(fullmodel)           
# Step:  AIC=1280.2 Geommean ~ log_CPUE + max_depth + max_temperature
#
#                   Df Deviance    AIC
# <none>                 1474.2 1280.2
# - max_depth        1   1485.6 1280.4
# - max_temperature  1   1495.8 1282.3
# - log_CPUE         1   1937.1 1355.5

Then, I look the p-value for each selected variables from the stepwise procedure:

summary(stepwise)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|) 
# (Intercept)      25.3323     3.5104   7.216 5.06e-12 ***
# log_CPUE         -3.5670     0.3811  -9.359  < 2e-16 ***
# max_depth        -0.5818     0.3959  -1.469   0.1428
# max_temperature  -6.1182     3.0290  -2.020   0.0443 *

I choose the max_depth as the least significant variable.

I run ANOVA to compare the most parsimonious model and the model without the least significant variable.

# Most parsimonious model:
parsimonious <- glm(Geommean ~ log_CPUE+max_temperature+max_depth) 

# Model without the least significant variable:
variable <- glm(Geommean ~ log_CPUE+max_temperature)

I compare these two models with ANOVA:

compare <- anova(parsimonious,variable, test="F")
compare
# Resid. Df Resid. Dev Df Deviance      F Pr(>F)
# 1       279     1474.2                          
#         280     1485.6 -1   -11.41 2.1593 **0.1428**

After doing ANOVA, I obtain the same p-value of the least significant variable (0.1428). In that case, I removed max_depth from the most parsimonoius model because the p-values is greater than 0.05. Do you think that this outcome is normal?

Thank you very much for your help.

Quite normal: AIC comparisons are equivalent to significance tests with size of about 15% for nested models. But the whole procedure is not very sensible: see [here](http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection) or one of the many posts about stepwise variable selection. — Scortchi - Reinstate Monica, May 13 '14 at 13:28
@Scortchi Yes, it's obvious but how is possible that after running stepwise regression one of the variables would be non significant (i.e. max_depth)? — Ignasi, May 13 '14 at 14:02
Because your stepwise procedure uses AIC to assess whether to remove a predictor, not F-tests at the 5% level. The clue is in the name: `stepAIC`. — Scortchi - Reinstate Monica, May 13 '14 at 14:05
I also run with the command "step" and obtain the same outcome. — Ignasi, May 13 '14 at 14:18
In this case the clue is deeply buried in the manual: `?step`. — Scortchi - Reinstate Monica, May 13 '14 at 14:20
And note you shouldn't be surprised at the t-test & F-test giving the same p-value: with 1 degree of freedom $F=t^2$, as you can confirm. — Scortchi - Reinstate Monica, May 13 '14 at 14:36

Same p-value when comparing two GLM

0 Answers0