29

I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59).

$$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$.

Is this a common formula for pseudo $R^2$ for GLMs?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
MarkDollar
  • 5,575
  • 14
  • 44
  • 60

4 Answers4

23

There are a large number of pseudo-$R^2$s for GLiMs. The excellent UCLA statistics help site has a comprehensive overview of them here. The one you list is called McFadden's pseudo-$R^2$. Relative to UCLA's typology, it is like $R^2$ in the sense that it indexes the improvement of the fitted model over the null model. Some statistical software, notably SPSS, if I recall correctly, print out McFadden's pseudo-$R^2$ by default with the results from some analyses like logistic regression, so I suspect it is quite common, although the Cox & Snell and Nagelkerke pseudo-$R^2$s may be even more so. However, McFadden's pseudo-$R^2$ does not have all of the properties of $R^2$ (no pseudo-$R^2$ does). If someone is interested in using a pseudo-$R^2$ to understand a model, I strongly recommend reading this excellent CV thread: Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)? (For what it's worth, $R^2$ itself is slipperier than people realize, a great demonstration of which can be seen in @whuber's answer here: Is $R^2$ useful or dangerous?)

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    I wonder if all these pseudo-R2s have been designed specifically for logistic regression only? Or do they generalize also for poisson and gamma-glms? I found different R2-formula for each possible GLM in `Colin Cameron, A., & Windmeijer, F. A. (1997). An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Econometrics, 77(2), 329-342.` – Jens Nov 04 '14 at 11:30
  • @Jens, some of them certainly seem LR specific, but other use the deviance, which you could get from any GLiM. – gung - Reinstate Monica Nov 04 '14 at 14:22
  • 1
    Note that McFadden's $R^2$ is often defined in terms of the log-likelihood, which is only defined up to an additive constant, and not the deviance as in the OP's question. Without a specification of the additive constant, McFadden's $R^2$ is not well defined. The deviance is one unique choice of the additive constant, which in my mind is the most appropriate choice, if the generalisation should be comparable with $R^2$ from linear models. – NRH Feb 26 '16 at 07:57
  • Given that GLMs are fit using iteratively reweighted least squares, as in http://bwlewis.github.io/GLM/, what would be the objection actually of calculating a weighted R2 on the GLM link scale, using 1/variance weights as weights (which glm gives back in the slot weights in a glm fit)? – Tom Wenseleers Jun 11 '19 at 13:16
  • @TomWenseleers, you may do as you like, but the basic arguments are in the "Which pseudo-$R^2$... to report..." thread I linked, especially [probabilityislogic's answer](https://stats.stackexchange.com/a/18485/). – gung - Reinstate Monica Jun 11 '19 at 16:31
  • @gung Still, given that a GLM is just fitted using iteratively reweighted least squares, with each iteration being g = family$linkinv(eta); gprime = family$mu.eta(eta); z = eta + (y - g) / gprime; weights = gprime^2 / family$variance(g) and the final coefficients then being given by a weighted LS regression summary(lm(z~X, weights = weights)) I don't see what would be wrong with a weighted R2 value linked to the weighted LS fit of the last iteration... This would also take into account the expected mean/variance relationship as that would be encapsulated by the weights. – Tom Wenseleers Jun 11 '19 at 19:36
  • @gung And for a gaussian identity link model that R2 value would also match the R2 of a regular OLS regression. So I don't entirely agree with that answer... – Tom Wenseleers Jun 11 '19 at 19:37
  • @TomWenseleers, as I said, you may do as you like. Unless I end up being the reviewer of your paper, my opinion doesn't matter much. – gung - Reinstate Monica Jun 11 '19 at 19:40
  • @gung In comment section of https://stats.stackexchange.com/questions/71720/error-metrics-for-cross-validating-poisson-models glen_b mentions that with large sample size the weighted MSE would match (by a constant factor) the deviance. I guess this is because with large sample sizes the estimated 1/variance weights would become near-exact. If one wanted to validate a model fit against a theoretical expectation (e.g. a fit vs a simulated model) this would not be a problem since in that case one could use the theoretical 1/variance weights. Need to think more about this... – Tom Wenseleers Jun 11 '19 at 19:49
  • @gung I put a reproducible example as a separate question here so you can see what I mean : https://stats.stackexchange.com/questions/412580/why-is-r2-not-reported-for-glms-based-on-last-iteration-of-weighted-least-square – Tom Wenseleers Jun 11 '19 at 22:10
9

R gives null and residual deviance in the output to glm so that you can make exactly this sort of comparison (see the last two lines below).

> x = log(1:10)

> y = 1:10

> glm(y ~ x, family = poisson)

>Call:  glm(formula = y ~ x, family = poisson)

Coefficients:
(Intercept)            x  
  5.564e-13    1.000e+00  

Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
Null Deviance:      16.64 
Residual Deviance: 2.887e-15    AIC: 37.97

You can also pull these values out of the object with model$null.deviance and model$deviance

David J. Harris
  • 11,178
  • 2
  • 30
  • 53
  • Ah, okay. I was just answering the question as written. I'd have added more, but I'm not 100% sure how the null deviance is calculated myself (it has something to do with a saturated model's log likelihood, but I don't remember enough of the details about saturation to be confident that I could give good intuitions) – David J. Harris Jun 08 '11 at 20:20
  • I don't have it in the glm output (family possion or quasipoisson). – Tomas Dec 10 '13 at 09:03
  • @Tomas see my edits. I don't know if I was mistaken 2 years ago or if the default output has changed since then. – David J. Harris Dec 10 '13 at 21:07
  • Tomas the information is produced by `summary.glm`. As for whether that definition of an $R^2$ is common would require some kind of survey. I would say it's not especially rare, in that I've seen it before, but not something that is necessarily widely used. – Glen_b Dec 10 '13 at 22:09
  • How does this post answer the question? There's nothing about $R^2$! @Glen_b I was referring to $R^2$, as this is what the question was about. And this is not in the `summary` output. – Tomas Jan 22 '14 at 18:39
  • @Curious Rhe question was about *pseudo*-$R^2$. The components of the formula that was posted in the question are all in the output. – David J. Harris Jan 22 '14 at 20:21
  • 1
    Read the question. Do you think you answer it? The question was not "where can I get the components of the formula?". – Tomas Jan 22 '14 at 21:24
6

The formula you proposed have been proposed by Maddala (1983) and Magee (1990) to estimate R squared on logistic model. Therefore I don't think it's applicable to all glm model (see the book Modern Regression Methods by Thomas P. Ryan on page 266).

If you make a fake data set, you will see that it's underestimate the R squared...for gaussian glm per example.

I think for a gaussian glm you can use the basic (lm) R squared formula...

R2gauss<- function(y,model){
    moy<-mean(y)
    N<- length(y)
    p<-length(model$coefficients)-1
    SSres<- sum((y-predict(model))^2)
    SStot<-sum((y-moy)^2)
    R2<-1-(SSres/SStot)
    Rajust<-1-(((1-R2)*(N-1))/(N-p-1))
    return(data.frame(R2,Rajust,SSres,SStot))
}

And for the logistic (or binomial family in r ) I would use the formula you proposed...

    R2logit<- function(y,model){
    R2<- 1-(model$deviance/model$null.deviance)
    return(R2)
    }

So far for poisson glm I have used the equation from this post.

https://stackoverflow.com/questions/23067475/how-do-i-obtain-pseudo-r2-measures-in-stata-when-using-glm-regression

There is also a great article on pseudo R2 available on researchs gates...here is the link:

https://www.researchgate.net/publication/222802021_Pseudo_R-squared_measures_for_Poisson_regression_models_with_over-_or_underdispersion

I hope this help.

Nico Coallier
  • 301
  • 3
  • 11
  • Just fit a GLM model with family=gaussian(link=identity) and check the value of `1-summary(GLM)$deviance/summary(GLM)$null.deviance` and you will see that the R2 does match the R2 value of a regular OLS regression, so the above answer is correct! See also my post here - https://stats.stackexchange.com/questions/412580/why-is-r2-not-reported-for-glms-based-on-last-iteration-of-irls-weighted-least-s – Tom Wenseleers Jun 16 '19 at 06:07
3

The R package modEvA calculates D-Squared as 1 - (mod$deviance/mod$null.deviance) as mentioned by David J. Harris

set.seed(1)
data <- data.frame(y=rpois(n=10, lambda=exp(1 + 0.2 * x)), x=runif(n=10, min=0, max=1.5))

mod <- glm(y~x,data,family = poisson)

1- (mod$deviance/mod$null.deviance)
[1] 0.01133757
library(modEvA);modEvA::Dsquared(mod)
[1] 0.01133757

The D-Squared or explained Deviance of the model is introduced in (Guisan & Zimmermann 2000) https://doi.org/10.1016/S0304-3800(00)00354-9

Timo Kvamme
  • 237
  • 2
  • 9