6

I'm using R to run some logistic regression. My variables were continuous, but I used cut to bucket the data. Some particular buckets for these variables always result in dependent variable being equal to 1. As expcted, the coefficient estimate for this bucket is very high, but the p-value is also high. There are about ~90 observations in either these buckets, and around 800 total observations, so I don't think it's a problem of sample size. Also, this variable should not be related to other variables, which would naturally reduce their p-values.

Are there any other plausible explanations for the high p-value?

Example:

myData <- read.csv("application.csv", header = TRUE)
myData$FICO <- cut(myData$FICO, c(0, 660, 680, 700, 720, 740, 780, Inf), right = FALSE)
myData$CLTV <- cut(myData$CLTV, c(0, 70, 80, 90, 95, 100, 125, Inf), right = FALSE)
fit <- glm(Denied ~ CLTV + FICO, data = myData, family=binomial())

Results are something like this:

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.53831  -0.77944  -0.62487   0.00027   2.09771  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -1.33630    0.23250  -5.747 9.06e-09 ***
CLTV(70,80]     -0.54961    0.34864  -1.576 0.114930    
CLTV(80,90]     -0.51413    0.31230  -1.646 0.099715 .  
CLTV(90,95]     -0.74648    0.37221  -2.006 0.044904 *  
CLTV(95,100]     0.38370    0.37709   1.018 0.308906    
CLTV(100,125]   -0.01554    0.25187  -0.062 0.950792    
CLTV(125,Inf]   18.49557  443.55550   0.042 0.966739    
FICO[0,660)     19.64884 3956.18034   0.005 0.996037    
FICO[660,680)    1.77008    0.47653   3.715 0.000204 ***
FICO[680,700)    0.98575    0.30859   3.194 0.001402 ** 
FICO[700,720)    1.31767    0.27166   4.850 1.23e-06 ***
FICO[720,740)    0.62720    0.29819   2.103 0.035434 *  
FICO[740,780)    0.31605    0.23369   1.352 0.176236    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1037.43  on 810  degrees of freedom
Residual deviance:  803.88  on 798  degrees of freedom
AIC: 829.88

Number of Fisher Scoring iterations: 16

FICO in the range [0, 660) and CLTV in the range (125, Inf] indeed always results in Denial = 1, so their coefficients are very large, but why are they also "insignificant"?

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
ch-pub
  • 741
  • 1
  • 8
  • 22

1 Answers1

11

This is the well-known Hauck-Donner effect whereby standard errors of maximum likelihood estimates blow up. The basic idea is that as the separation becomes complete, the estimate of the standard error blows up faster than the estimate of the log odds ratio, rendering Wald $\chi^2$ statistics useless (and $P$-values large). Use likelihood ratio tests instead. These are unaffected by complete separation.

Why are you cutting continuous predictors? It seems strange to assume effects that are not only piecewise flat but that are discontinuous at the cuts. This is causing part of your problem.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • 1
    if the OP is getting complete separation they also might want to consider regularization/bias-corrected approaches (e.g. `brglm` package) -- although I agree that they should think carefully about the cutting first ... – Ben Bolker Jun 09 '14 at 17:50
  • Thanks. My post is a simplified example, so it may not be clear based on my example, but basically there are several "rules" that are supposed to be followed for determining Denial. There are also some ranges (of FICO or CLTV for example) where Denial becomes discretionary. Bottom line is, my client wants to see cut variables because that's most intuitive for them based on the guidelines, even though there is also some discretion. – ch-pub Jun 09 '14 at 17:53
  • @BenBolker Any merit in using bayesglm() without modifying the prior? This also seems to remedy the effect I'm experiencing with glm(). – ch-pub Jun 09 '14 at 19:57
  • 2
    The purpose of regularization (shrinkage/penalization) is to make estimates biased towards zero (not so much to bias-correct). I think the likelihood ratio test is the easier solution. Infinite parameter estimates do not present problems in prediction or problems to likelihood ratio tests. The cut variables are not only non-intuitive but create significant modeling and interpretation problems not to mention loss of information. It is not good statistical practice IMHO. – Frank Harrell Jun 09 '14 at 20:47
  • `brglm` implements Firth's penalization, which offsets an $O(n-1)$ term in the bias of MLEs: Firth (1993), "Bias reduction of maximum likelihood estimates", *Biometrika*, **80**, pp 27–38. I'm curious as to why you say "infinite parameter estimates do not present problems in prediction" because that's where, intuitively, they can be most problematic: your model says the probability's 100% for a given value of the predictor on which separation occurred whatever the values of any other predictors. – Scortchi - Reinstate Monica Jun 10 '14 at 11:18
  • We are interested in bias reduction when overfitting is not a problem. Otherwise we want to _create bias_ to shrink. To answer your question, suppose that there is only an intercept in the model and the observed data for 20 observations are 20 $Y=1$ values. The maximum likelihood estimate of $Prob(Y=1)$ is 1.0 and the MLE of the log odds is $\infty$, which provides the correct predicted value of 1.0. When there are other predictors, the predictions will still be correct when there are empty cells. – Frank Harrell Jun 10 '14 at 12:33