2

This is my first question, please should I write something wrong correct me.

I have a question when comparing two GLMs after applying stepwise selection. What I've always heard is that stepwise selection is not robust enough for the selected variables. Therefore, I use ANOVA to compare the most parsimonious model (lowest AIC) and the model without the least significant variable. Accordingly, if the ANOVA outcome is nonsignificant I remove the variable from the stepwise selection. It is said that the simple models (few predictors) dominated over difficult model (a lot of predictors).

My problem is that when I try to do it with R. The p-value from the ANOVA is the same as the p-value of the stepwise selection. Do you think that I'm doing something wrong?

Here is my code:

I first run the command stepAIC and obtain the most parsimonious model:

Fullmodel<-glm(Geommean~log_CPUEig+log_CPUE+max_depth+amp_temperature+TotalP+Elevation+max_temperature+lake_area)

stepwise <- stepAIC(fullmodel)           
# Step:  AIC=1280.2 Geommean ~ log_CPUE + max_depth + max_temperature
#
#                   Df Deviance    AIC
# <none>                 1474.2 1280.2
# - max_depth        1   1485.6 1280.4
# - max_temperature  1   1495.8 1282.3
# - log_CPUE         1   1937.1 1355.5

Then, I look the p-value for each selected variables from the stepwise procedure:

summary(stepwise)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|) 
# (Intercept)      25.3323     3.5104   7.216 5.06e-12 ***
# log_CPUE         -3.5670     0.3811  -9.359  < 2e-16 ***
# max_depth        -0.5818     0.3959  -1.469   0.1428
# max_temperature  -6.1182     3.0290  -2.020   0.0443 *

I choose the max_depth as the least significant variable.

I run ANOVA to compare the most parsimonious model and the model without the least significant variable.

# Most parsimonious model:
parsimonious <- glm(Geommean ~ log_CPUE+max_temperature+max_depth) 

# Model without the least significant variable:
variable <- glm(Geommean ~ log_CPUE+max_temperature) 

I compare these two models with ANOVA:

compare <- anova(parsimonious,variable, test="F")
compare
# Resid. Df Resid. Dev Df Deviance      F Pr(>F)
# 1       279     1474.2                          
#         280     1485.6 -1   -11.41 2.1593 **0.1428**

After doing ANOVA, I obtain the same p-value of the least significant variable (0.1428). In that case, I removed max_depth from the most parsimonoius model because the p-values is greater than 0.05. Do you think that this outcome is normal?

Thank you very much for your help.

QuantIbex
  • 3,880
  • 1
  • 24
  • 42
Ignasi
  • 21
  • 1

0 Answers0