0

It is known that the bernoulli distribution is a special case of the binomial distribution, and when we look at the difference between the null deviance and residual deviance are equal when fitting a logistic regression model to both distributions. However, calculating the percentage deviance explained, it seems that the logistic regression model fitted with the binomial data explains far more than the bernoulli data.

I would expect the two models to have the same goodness of fit, but I'm not sure which conclusion to make:
A) The two models have the same goodness of fit since they reduce the same amount of deviance.
B) The two model fitted with the binomial data is better since it results in higher percentage deviance explained vs the mode fitted with the bernoulli data.

I would like to understand which conclusion is correct, and why I can't interpret it the other way. Additionally, are these two models equal? I don't think their response variables(data) are equal since binomial response deals with proportions, and the other deals with binary outcomes, but if this is the case, why do people say these two models are equal? (Unless that statement is wrong)

I did look at this resource: Logistic Regression: Bernoulli vs. Binomial Response Variables which speaks about the difference in deviance being equal for both models.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Soo Kyung Ahn
  • 13
  • 1
  • 6
  • 1
    The two models are equivalent, and will lead to the same conclusions. The only difference of import is in the residuals and goodness-of-fit testing. All that shouls be clear from https://stats.stackexchange.com/questions/144121/logistic-regression-bernoulli-vs-binomial-response-variables, which you linked. Can you tell us what is not clear there? – kjetil b halvorsen May 11 '20 at 11:53
  • 1
    Yes! You mention the goodness-of-fit is different for the two distributions, does that mean the logistic regression to binomial data is somehow better than fitting it to bernoulli data? If so, then why is that? Also, is comparing the two models using the difference between the null and residual deviance a valid approach? Because if this is the case the two models have the same goodness-of-fit, but using percentage deviance explained, using binomial data is far better. – Soo Kyung Ahn May 12 '20 at 11:31
  • Ni, it is not better, but residuals and goodness-of-fits tests have different properties, so that part of the analysis is different, but the final conclusions should not be. With binomial data residuals are less discrete (and fewer ...) and tests could have better distribution properties. I will try to write an answer along such lines. – kjetil b halvorsen May 12 '20 at 19:14

0 Answers0