Model selection using a standard p-value approach

Question

I need to perform model selection using a standard p-value approach. Using logistic regression we would like to compare the following models:

Y =       A + B + C + D
Y = A*B + A + B + C + D
Y = A*C + A + B + C + D
Y = A*D + A + B + C + D

I performed the following analysis in R:

first  <- glm(Y~G+N+E+C,     family="binomial")
second <- glm(Y~G*N+G+N+E+C, family="binomial")
third  <- glm(Y~G*E+G+N+E+C, family="binomial")
fourth <- glm(Y~G*C+G+N+E+C, family="binomial")

summary(first)
summary(second)
summary(third)
summary(fourth)

anova(first, second, third, fourth, test="Chisq")

However, I think I do not have the right output here for a model selection based on p-values?

anova(first, second, third, fourth, test="Chisq")
# Analysis of Deviance Table
# Model 1: Y ~ G + N + E + C
# Model 2: Y ~ G * N + G + N + E + C
# Model 3: Y ~ G * E + G + N + E + C
# Model 4: Y ~ G * C + G + N + E + C
# Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi)
# 1   595   |  609.90    |                     
# 2   594   |  609.90    | 1 | 0.000169 |   0.9896
# 3   594   |  609.81    | 0 | 0.087775 |        
# 4   594   |  609.90    | 0 | -0.085001|

So, how to perform a model selection here, using a standard p-value approach?

Why do you "need to perform model selection"? This is almost never a good idea, (see: [why-is-variable-selection-necessary](http://stats.stackexchange.com/questions/18214//18245#18245). It is certainly a bad idea using p-values (see: [algorithms-for-automatic-model-selection](http://stats.stackexchange.com/questions/20836//20856#20856)). — gung - Reinstate Monica, Aug 28 '13 at 13:06
Hi, It's just an assignment for school here. I also need to argue why p-values are a bad idea, so you helped me out here! Thanks — Sophie, Aug 28 '13 at 21:07
That sounds good, @Sophie. If this is a school-assignment, you need to add the `[self-study]` tag. Note that our policy is special in this situation; please review our [help page](http://stats.stackexchange.com/help/on-topic) & the [self-study wiki](http://stats.stackexchange.com/tags/self-study/info), & let us know what you've done already & where you're stuck. — gung - Reinstate Monica, Aug 28 '13 at 21:30
Hi, I have the analysis performed, but I don't know wether I performed it correctly... Can I post my R-script, Output and interpretation as a question, and add [self-study] or/and [homework]? Or is that not acceptable here? — Sophie, Aug 28 '13 at 21:38
There is no `[homework]` tag anymore, just add the `[self-study]` tag, & understand that we will treat this specially as noted at the links I provided above. — gung - Reinstate Monica, Aug 28 '13 at 21:40
Like this: http://stats.stackexchange.com/questions/68622/perform-model-selection-using-bayes-factors-based-on-the-bic-statistic — Sophie, Aug 28 '13 at 22:00
Please don't open a new question, @Sophie. Just add the `[self-study]` tag to *this* one. — gung - Reinstate Monica, Aug 28 '13 at 22:01

score 1 · Answer 1 · edited Aug 29 '13 at 13:45

Could it be possible to formulate these models as a nested structure?

First you would have a model like

Y=A*B+A*C+A*D+A+B+C+D

which is H1, and then

Y=A*B+A*C+A+B+C+D

which is H0.

Now you could do a Likelihood-Ratio test which is -2*logLik(H0)+2*logLik(H1) and is distributed as a $\chi^2$ with $k+p-k$ degrees of freedom where $p$ is number of extra parameters in the H1. If the result is large then you can say that it is not proper to use the H0 model, H1 is needed to explain variation in the original data (seen in the extra deviance which increases if second largest is used versus largest model).

This could be done in steps: first take largest model as H1, and second largest as H0; in the next step (if H0 is rejected) you will take second largest model as H1 and third largest model as H0. And so on...

Hi, I only need to compare the models as specified above, which means that I have got non-nested models. I need to use a standard p-value approach, only one I could think about here was the Chi-square. Which means that I can only compare model 1 with all the others, but not model 2 with model 3 for example. However, the statistics favor model 1 with p-values above .72. But than the question remains, is this the best way to perform model selection using a standard p-value approach? — Sophie, Aug 28 '13 at 20:50

Model selection using a standard p-value approach

1 Answers1