1

I'm trying to perform a Hosmer-Lemeshow goodness of fit test on a bunch of logistic regression models but for some reason the p-value that is returned is 1 on all of the models. Does anybody have a clue on why that is happening?

Here's a piece of my R-code:

model1 <- glm(c(variable1,variable2)~x+y+z, family="binomial")
model2 <- glm(c(variable1,variable2)~x+y,   family="binomial")
model3 <- glm(c(variable1,variable2)~x,     family="binomial")

hoslem.test(model1$y, model1$fitted)
hoslem.test(model2$y, model1$fitted)
hoslem.test(model2$y, model1$fitted)

and the output of the tests are:

X-squared = 0.35263, df = 8, p-value = 1
X-squared = 0.035221, df = 8, p-value = 1
X-squared = 0.29097, df = 8, p-value = 1
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Janono
  • 423
  • 6
  • 15
  • 3
    We need more information for this to be answerable. What are your variables? How much data do you have? Etc. Can you post your data? – gung - Reinstate Monica Apr 01 '17 at 16:23
  • @gung my data is about 300 rows and 10 columns. The explanaotry variables are numerical data between 0 and 1, and the response variables are numerical data from the natural numbers. – Janono Apr 01 '17 at 16:32
  • @MichaelChernick hehe well the library im using in R calls it hoslem.test :P – Janono Apr 01 '17 at 16:34

2 Answers2

2

Does anybody have a clue on why that is happening?

I think the test is working as it should. The p values are just being rounded up to 1. E.g.:

> pchisq(0.35263, 8, lower.tail = F)
[1] 0.999965

Whether this test meets your needs is another matter.

Matthew
  • 420
  • 6
  • 15
1

@Matthew's answer is spot-on. Adding a bit of detail: I found this function in the ResourceSelection package (using library("sos"); findFn("hoslem.test")). The relevant bit of code reads

PVAL = 1 - pchisq(chisq, g - 2)

the authors should probably be using pchisq(chisq, g-2, lower.tail=FALSE) for greater precision, but that's not really the issue here (we would only really start to run into trouble when $1-p < 10^{-16}$).

The part of the stats:::print.htest function responsible for printing the results reads:

print.htest(..., digits = getOption("digits") , ...) {
  ## ...
  fp <- format.pval(x$p.value, digits = max(1L, digits - 3L))
  ## ...
}

The default value of the "digits" option is 7, so R is printing the result of

format.pval(0.999965, digits=4)

which is "1".

For your most extreme example ($\chi^2=0.035221$), $p=1-3.9 10^{-9}$, so you would need even more digits of precision to see the difference from 1.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126