0

I fit a logistic regression model with 14 predictors, here's the code and output:

veg.fit <- glm(veg~., family = binomial, data=df.c)
summary(veg.fit)
Call:
glm(formula = veg ~ ., family = binomial, data = df.c)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.26825  -0.50300  -0.22594  -0.08373   2.85784  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)  
(Intercept) 11.8685548  9.0614215   1.310   0.1903  
gender      -0.4982546  1.1332216  -0.440   0.6602  
age          0.0122210  0.0653607   0.187   0.8517  
hsgpa       -1.6201620  1.4236274  -1.138   0.2551  
cogpa       -1.6635211  1.7608781  -0.945   0.3448  
dhome       -0.0003964  0.0004423  -0.896   0.3700  
dres         0.1457214  0.1326084   1.099   0.2718  
tv           0.0158743  0.0845977   0.188   0.8512  
sport       -0.2994841  0.2173609  -1.378   0.1683  
news         0.1128158  0.1988480   0.567   0.5705  
aids         0.0811258  0.1685509   0.481   0.6303  
affil       -0.4934158  0.6130881  -0.805   0.4209  
ideol       -1.0391178  0.5932011  -1.752   0.0798 .
relig        0.9825565  0.7048663   1.394   0.1633  
abor         0.1605618  1.8966853   0.085   0.9325  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 50.725  on 59  degrees of freedom
Residual deviance: 36.613  on 45  degrees of freedom
AIC: 66.613

Number of Fisher Scoring iterations: 7

Then I did the likelihood ratio test for null hypothesis: $\beta_{1}=...=\beta_{14}=0$

1-pchisq(50.725-36.613,59-54)
0.01491341

which shows significant, but when I check the coefficient of each predictor, none of them show significant(all their p-value is large). I wonder how could this happen?

  • 1
    Residual deviance has df=45, but you typed 54 in pchisq function. – danbrown May 26 '21 at 22:14
  • 1
    While the answer below by @EdM is correct in general, if you use the right df you get a global test with 14 df and `pchisq(50.725-36.613,59-45,lower.tail=FALSE)` is 0.44, so there's not even anything to explain in this specific case – Thomas Lumley May 27 '21 at 00:37

1 Answers1

1

Your model includes too many predictors for the number of observations.

A rule of thumb for logistic regression is to have no more than 1 predictor for each 15 or so members of the minority class. With about 60 total cases (based on the null degrees of freedom), you have no more than 30 members of the minority class. So anything over about 2 or 3 predictors is likely to lead to an overfit model.

So you can seem to fit your data with a large number of predictors and get a "significant" p-value overall while no single predictor shows a significant association with outcome. You've basically just contorted your set of predictor values to fit this particular data set. Your model would probably not work well on another sample from the population.

You should use your knowledge of the subject matter to select (without looking at outcomes) a subset of predictors or to combine multiple related predictors into single combined predictors. Alternatively, look into penalized regression methods like ridge regression or LASSO, which can help with this type of situation with too many predictors for too few data points.

EdM
  • 57,766
  • 7
  • 66
  • 187