1

I have a binary response variable and a categorical predictor variable. If I test for associations between the 2 variables using chi-square test , it turns out to be significant. However, if I do a logistic regression with the same set of variables, the predictor is not significant. Why does this happen?

  table(Data1$pred,Data1$target)

                            0    1
  Level1                    1    0
  Level2                    4    0
  Level3                   98    1
  Level4                 2056   22
  Level5                    1    0
  Level6                    2    0
  Level7                  311    0
  Level8                    6    1
  Level9                  131    7
  Level10                  49    2

  chisq.test(table(Data1$pred,Data1$target))

  Pearson's Chi-squared test

  data:  tabletable(Data1$pred,Data1$target)
  X-squared = 34.2614, df = 9, p-value = 8.037e-05

Logistic Regression on the same

  logit.glm <- glm(as.factor(target) ~ pred,                  
               data=Data1, family=binomial(link="logit")
  summary(logit.glm)
  Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5553  -0.1459  -0.1459  -0.1459   3.0315  

  Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
  (Intercept)   -2.057e+01  1.773e+04  -0.001    0.999
  Data1Level2   -6.313e-06  1.982e+04   0.000    1.000
  Data1Level3    1.598e+01  1.773e+04   0.001    0.999
  Data1Level4    1.603e+01  1.773e+04   0.001    0.999
  Data1Level5   -6.312e-06  2.507e+04   0.000    1.000
  Data1Level6   -6.312e-06  2.172e+04   0.000    1.000
  Data1Level7   -6.312e-06  1.776e+04   0.000    1.000
  Data1Level8    1.877e+01  1.773e+04   0.001    0.999
  Data1Level9    1.764e+01  1.773e+04   0.001    0.999
  Data1Level10   1.737e+01  1.773e+04   0.001    0.999

  (Dispersion parameter for binomial family taken to be 1)

   Null deviance: 356.09  on 2691  degrees of freedom
   Residual deviance: 333.06  on 2682  degrees of freedom
   AIC: 353.06

   Number of Fisher Scoring iterations: 19
user3897
  • 517
  • 1
  • 7
  • 13
  • A difference of 23 between the null & residual deviance on 9 degrees of freedom would usually be regarded as significant. Note that the overall null can be significant at a given level even though its constituents aren't (see [How can a regression be significant yet all predictors be non-significant?](http://stats.stackexchange.com/q/14500/17230)); & that the counts are too low for the Wald estimate of standard error to be much good - there's complete separation on some dummy variables. – Scortchi - Reinstate Monica Oct 19 '15 at 14:08
  • See also [Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?](http://stats.stackexchange.com/a/144608/17230) for some background on the different tests. – Scortchi - Reinstate Monica Oct 20 '15 at 11:31

0 Answers0