I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59).
$$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$.
Is this a common formula for pseudo $R^2$ for GLMs?
I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59).
$$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$.
Is this a common formula for pseudo $R^2$ for GLMs?
There are a large number of pseudo-$R^2$s for GLiMs. The excellent UCLA statistics help site has a comprehensive overview of them here. The one you list is called McFadden's pseudo-$R^2$. Relative to UCLA's typology, it is like $R^2$ in the sense that it indexes the improvement of the fitted model over the null model. Some statistical software, notably SPSS, if I recall correctly, print out McFadden's pseudo-$R^2$ by default with the results from some analyses like logistic regression, so I suspect it is quite common, although the Cox & Snell and Nagelkerke pseudo-$R^2$s may be even more so. However, McFadden's pseudo-$R^2$ does not have all of the properties of $R^2$ (no pseudo-$R^2$ does). If someone is interested in using a pseudo-$R^2$ to understand a model, I strongly recommend reading this excellent CV thread: Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)? (For what it's worth, $R^2$ itself is slipperier than people realize, a great demonstration of which can be seen in @whuber's answer here: Is $R^2$ useful or dangerous?)
R gives null and residual deviance in the output to glm
so that you can make exactly this sort of comparison (see the last two lines below).
> x = log(1:10)
> y = 1:10
> glm(y ~ x, family = poisson)
>Call: glm(formula = y ~ x, family = poisson)
Coefficients:
(Intercept) x
5.564e-13 1.000e+00
Degrees of Freedom: 9 Total (i.e. Null); 8 Residual
Null Deviance: 16.64
Residual Deviance: 2.887e-15 AIC: 37.97
You can also pull these values out of the object with model$null.deviance
and model$deviance
The formula you proposed have been proposed by Maddala (1983) and Magee (1990) to estimate R squared on logistic model. Therefore I don't think it's applicable to all glm model (see the book Modern Regression Methods by Thomas P. Ryan on page 266).
If you make a fake data set, you will see that it's underestimate the R squared...for gaussian glm per example.
I think for a gaussian glm you can use the basic (lm) R squared formula...
R2gauss<- function(y,model){
moy<-mean(y)
N<- length(y)
p<-length(model$coefficients)-1
SSres<- sum((y-predict(model))^2)
SStot<-sum((y-moy)^2)
R2<-1-(SSres/SStot)
Rajust<-1-(((1-R2)*(N-1))/(N-p-1))
return(data.frame(R2,Rajust,SSres,SStot))
}
And for the logistic (or binomial family in r ) I would use the formula you proposed...
R2logit<- function(y,model){
R2<- 1-(model$deviance/model$null.deviance)
return(R2)
}
So far for poisson glm I have used the equation from this post.
There is also a great article on pseudo R2 available on researchs gates...here is the link:
I hope this help.
The R package modEvA
calculates D-Squared
as 1 - (mod$deviance/mod$null.deviance)
as mentioned by David J. Harris
set.seed(1)
data <- data.frame(y=rpois(n=10, lambda=exp(1 + 0.2 * x)), x=runif(n=10, min=0, max=1.5))
mod <- glm(y~x,data,family = poisson)
1- (mod$deviance/mod$null.deviance)
[1] 0.01133757
library(modEvA);modEvA::Dsquared(mod)
[1] 0.01133757
The D-Squared or explained Deviance of the model is introduced in (Guisan & Zimmermann 2000) https://doi.org/10.1016/S0304-3800(00)00354-9