I'm using R to run some logistic regression. My variables were continuous, but I used cut to bucket the data. Some particular buckets for these variables always result in dependent variable being equal to 1. As expcted, the coefficient estimate for this bucket is very high, but the p-value is also high. There are about ~90 observations in either these buckets, and around 800 total observations, so I don't think it's a problem of sample size. Also, this variable should not be related to other variables, which would naturally reduce their p-values.
Are there any other plausible explanations for the high p-value?
Example:
myData <- read.csv("application.csv", header = TRUE)
myData$FICO <- cut(myData$FICO, c(0, 660, 680, 700, 720, 740, 780, Inf), right = FALSE)
myData$CLTV <- cut(myData$CLTV, c(0, 70, 80, 90, 95, 100, 125, Inf), right = FALSE)
fit <- glm(Denied ~ CLTV + FICO, data = myData, family=binomial())
Results are something like this:
Deviance Residuals:
Min 1Q Median 3Q Max
-1.53831 -0.77944 -0.62487 0.00027 2.09771
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.33630 0.23250 -5.747 9.06e-09 ***
CLTV(70,80] -0.54961 0.34864 -1.576 0.114930
CLTV(80,90] -0.51413 0.31230 -1.646 0.099715 .
CLTV(90,95] -0.74648 0.37221 -2.006 0.044904 *
CLTV(95,100] 0.38370 0.37709 1.018 0.308906
CLTV(100,125] -0.01554 0.25187 -0.062 0.950792
CLTV(125,Inf] 18.49557 443.55550 0.042 0.966739
FICO[0,660) 19.64884 3956.18034 0.005 0.996037
FICO[660,680) 1.77008 0.47653 3.715 0.000204 ***
FICO[680,700) 0.98575 0.30859 3.194 0.001402 **
FICO[700,720) 1.31767 0.27166 4.850 1.23e-06 ***
FICO[720,740) 0.62720 0.29819 2.103 0.035434 *
FICO[740,780) 0.31605 0.23369 1.352 0.176236
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1037.43 on 810 degrees of freedom
Residual deviance: 803.88 on 798 degrees of freedom
AIC: 829.88
Number of Fisher Scoring iterations: 16
FICO in the range [0, 660) and CLTV in the range (125, Inf] indeed always results in Denial = 1, so their coefficients are very large, but why are they also "insignificant"?