I would like recover the gamma distribution parameters from a model fit in R using glm(..., family = Gamma)
. The first step is trying to figure out which parameterization R's glm
function uses for the gamma distribution. Looking through ?glm
, ?family
, and ?Gamma
didn't help me. This post was helpful in understanding the parameterization, but doesn't say much about recovering the parameter estimates from the output. This post explains how to get the dispersion estimate from the fit model object, and I use that and the estimated mean values to calculate the shape and scale parameters.
# model from example in ?glm
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
fit <- glm(lot1 ~ log(u), data = clotting, family = Gamma)
# mean = shape * scale (https://en.wikipedia.org/wiki/Gamma_distribution)
# mean can be directly obtained from model output
mu <- fit$fitted.values
# mu <- predict(fit, type = 'response')
# dispersion = residual deviance / residual degrees of freedom (https://stats.stackexchange.com/a/367563/45889)
dispersion <- (fit$deviance / fit$df.residual)
# and shape parameter is inverse of dispersion (https://stats.stackexchange.com/a/247631/45889)
shape <- 1 / dispersion
# scale = mu / shape (https://en.wikipedia.org/wiki/Gamma_distribution)
scale <- mu / shape
# variance = shape * scale^2 (https://en.wikipedia.org/wiki/Gamma_distribution)
var <- shape * scale^2
Is this correct? Because when I try to manually calculate Pearson residuals, my Pearson residuals differ from R's Pearson residuals by a factor of sqrt(shape)
.
# raw residuals
observed <- clotting$lot1
expected <- fit$fitted.values
raw_residuals <- observed - expected
# Pearson residuals = raw residuals / variance function (https://v8doc.sas.com/sashtml/insight/chap39/sect56.htm)
pearson_residuals <- raw_residuals / sqrt(var)
# but when I compare this to R's Pearson residuals, they're different
all.equal(resid(fit, 'pearson'), pearson_residuals)
# they're different by a factor of sqrt(shape)
pearson_residuals / resid(fit, 'pearson')
sqrt(shape)
var2 <- var * shape
pearson_residuals2 <- raw_residuals / sqrt(var2)
all.equal(resid(fit, 'pearson'), pearson_residuals2)
# TRUE
What am I missing? I have been assuming that I'm misunderstanding something with the parameterization of the gamma distribution, but perhaps am I doing something wrong with the Pearson residuals?