This is from the book The statistical sleuth--A course in methods of Data analysis Chapter 20, Exercise 12(c)-(e). I am using logistic regression to predict carrier with possible predictors CK
and H
. Here is my solution:
Carrier <- c(0,0,0,0,0,1,1,1,1,1)
CK <- c(52,20,28,30,40,167,104,30,65,440)
H <- c(83.5,77,86.5,104,83,89,81,108,87,107)
logCK <- log(CK)
fit4 <- glm(Carrier~logCK+H, family="binomial", control=list(maxit=100))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(fit4)
##
## Call:
## glm(formula = Carrier ~ logCK + H, family = "binomial", control = list(maxit = 100))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.480e-05 -2.110e-08 0.000e+00 2.110e-08 1.376e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2292.8 4130902.8 -0.001 1
## logCK 315.6 589675.2 0.001 1
## H 11.5 21279.6 0.001 1
This results appear to be weird, because it seems that all coefficients are not significant. Also the next question is to do a drop-in-deviance test for this full model and the reduced model that neither of logCK
and H
is useful predictor. I get:
fit5 <- glm(Carrier~1, family="binomial")
1-pchisq(deviance(fit5)-deviance(fit4), df.residual(fit5)-df.residual(fit4))
## [1] 0.0009765625
So the p-value indicates that at least one of logCK
and H
is useful. Then I'm stuck at the next question, it asks me to calculate odds ratio for a woman with (CK, H)=(300,100) over one with (CK, H)=(80, 85).
But how can I get a meaningful result with all coefficients in this model ranging so wildly? Is there anything wrong with the way I did this logistic regression? Are there any remedial measures?