0

In Hosmer and Lemeshow's 1980 paper, Theorem 2 states that the asymptotic distribution of $\hat{C}^*_g$ (the usual Hosmer-Lemeshow test statistic) is \begin{equation} \tag{1} \chi^2_{2g-g-(p+1)} + \sum_{i=1}^{p+1} \lambda_i \chi^2_i(1), \end{equation} where the $\lambda_i$ are eigenvalues of a matrix (specified in the paper, not relevant to this question). Then, they show through simulations that $\sum_{i=1}^{p+1} \lambda_i \chi^2_i(1)$ is approximately $\chi^2_{p-1}$, which leads to the usual $g-2$ degrees of freedom in the Hosmer-Lemeshow test.

This makes sense, except for the fact that from Moore and Spruill's (1975) Theorem 5, shouldn't the distribution of the statistic be $$ \chi^2_{M-m-1} + \sum_{i=1}^{p+1} \lambda_i \chi^2_i(1),$$ where $M=2g$, and $m=p+1$? I don't see how Hosmer and Lemeshow got the $2g-g-(p+1)$ degrees of freedom on the first term in (1).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Math321
  • 3
  • 5
  • Hosmer-Lemeshow is considered obsolete: https://stats.stackexchange.com/questions/273966/logistic-regression-with-poor-goodness-of-fit-hosmer-lemeshow – kjetil b halvorsen May 14 '20 at 11:30

1 Answers1

1

Do Hosmer-Lemeshow say that $M=2g$ ?

In their paper I see that they replace $M-m-1$ by $2g-g-(p+1)$ there are at least two possibilities:

  • First possibility is the one you seem to assume, namely that HL say that $M=2g$ and $m+1=g+(p+1)$, but HL do not say that this is what they do, I think you assume that this is what they do ?
  • I think it is more likely that $M=2g-g$ and $m=p$. This is because, if you cut the scores into $g$ groups, it is like you have $2g$ groups (because you have one for the '0' outcomes and one for the '1' outcomes) but these groups are not 'independent'. If you cut the scores into $g$ groups then I think you have $g$ groups and $M=g$, so with $m=p$ you find that $M-m-1=g-p-1$.

But in order to be sure I would have to go through the Moore-Spruill paper in more detail.

EDIT 7/8/2017

You find the Moore-Spruill paper via this link.

On page 601 under section 2, we see that $m$ is the number of parameters that is estimated because it says that $\theta$ is in $\mathbb{R}^m$. In that section it is also said that $M$ is the number of groups.

So it can be seen that $m=p$, for $M$ I think you have to look at page 602, the formula for $v_{n\sigma}(\theta,\eta,\varphi)$ and compare that to formula 5.5 on page 148 of this link, then you see that $\sigma$ is from $1$ to $g$ in the book of HL. In the paper of Moore-Spruill $\sigma$ is from $1$ to $M$.

So this quick look makes me think that $M=g$ and $m=p$.

  • Right, the $2g$ groups are not independent, as you mention. What you are saying makes sense. Thanks. :) – Math321 Aug 05 '17 at 16:05
  • @Math321: I have taken a quick look at the Moore-Spruill paper and I edited my answer –  Aug 07 '17 at 14:41