6

My name is Tuhin. I came up with a couple of questions when I was doing an analysis in R.

I did a logistic regression analysis in R and tried to check how good the model fits the data.

But, I got stuck as I could not get the pseudo R square value for the model which could give me some idea about the variation explained by the model.

Could you please guide me on how to achieve this value (pseudo R square for Logistic regression analysis). It would also be helpful if you could show me a way to get the Hosmer Lemeshow statistic for the model as well. I found out a user defined function to do it, but if there is a quicker way possible, it would be really helpful.

I would be very grateful if you can provide me the answers to my queries.

Eagerly waiting for your response.

Regards

chl
  • 50,972
  • 18
  • 205
  • 364
  • 1
    Possible duplicate of [Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?](https://stats.stackexchange.com/questions/3559/which-pseudo-r2-measure-is-the-one-to-report-for-logistic-regression-cox-s) – kjetil b halvorsen Apr 05 '19 at 20:29
  • I don't think this is a duplicate, but I do think it's off topic (and 9 years old!) This one is asking how to get R to do something, not which measure is best. – Peter Flom Apr 06 '19 at 11:06
  • Hosmer-Lemeshow is considered obsolete: https://stats.stackexchange.com/questions/273966/logistic-regression-with-poor-goodness-of-fit-hosmer-lemeshow – kjetil b halvorsen May 14 '20 at 12:16

2 Answers2

5

Take a look at the lrm() function from the Design package. It features everything you need for fitting GLM. The Hosmer and Lemeshow test has limited power and depends on arbitrary discretization; it is discussed in Harrell, Regression Modeling Strategies (p. 231) and on the R-help mailing-list. There is also a comparison of GoF tests for logistic regression in A comparison of goodness-of-fit tests for the logistic regression model, Stat. Med. 1997 16(9):965.

Here is an example of use:

library(Design)  # depends on Hmisc
x1 <- rnorm(500)
x2 <- rnorm(500)
L  <- x1+abs(x2)
y  <- ifelse(runif(500)<=plogis(L), 1, 0)
f <- lrm(y ~ x1+x2, x=TRUE, y=TRUE)
resid(f, 'gof')

which yields something like

Sum of squared errors     Expected value|H0                    SD 
         100.33517914          100.37281429            0.37641975 
                    Z                     P 
          -0.09998187            0.92035872 

but see help(residuals.lrm) for additional help.

The following thread contains critical discussions that might also be helpful: Logistic Regression: Which pseudo R-squared measure is the one to report (Cox & Snell or Nagelkerke)?

chl
  • 50,972
  • 18
  • 205
  • 364
1

Pseudo R Square is very easy to calculate manually. You just need to look up the -2LL value for the baseline based on the average probability of occurrence of the binomial event. And, you need the -2LL value for the actual Logistic regression.

Let's say the -2LL value for the baseline is 10 and for the Logistic regression model it is 5. Then, the calculation of the Pseudo R Square is: (10 - 5)/10 = 50%.

Another common Pseudo R Square measure is the McFadden R Square that generates the same result. Its calculation is: 1-(5/10) = 50%.

The Pseudo R Square measures do tell you how much your Logistic Regression model reduces the error vs simply guessing the average probability of occurrence for every observations.

Sympa
  • 6,862
  • 3
  • 30
  • 56