Help understanding parameterization of gamma distribution in R's glm()

Question

I would like recover the gamma distribution parameters from a model fit in R using glm(..., family = Gamma). The first step is trying to figure out which parameterization R's glm function uses for the gamma distribution. Looking through ?glm, ?family, and ?Gamma didn't help me. This post was helpful in understanding the parameterization, but doesn't say much about recovering the parameter estimates from the output. This post explains how to get the dispersion estimate from the fit model object, and I use that and the estimated mean values to calculate the shape and scale parameters.

# model from example in ?glm
clotting <- data.frame(
  u = c(5,10,15,20,30,40,60,80,100),
  lot1 = c(118,58,42,35,27,25,21,19,18),
  lot2 = c(69,35,26,21,18,16,13,12,12))
fit <- glm(lot1 ~ log(u), data = clotting, family = Gamma)


# mean = shape * scale (https://en.wikipedia.org/wiki/Gamma_distribution)
# mean can be directly obtained from model output
mu <- fit$fitted.values
# mu <- predict(fit, type = 'response')

# dispersion = residual deviance / residual degrees of freedom (https://stats.stackexchange.com/a/367563/45889)
dispersion <- (fit$deviance / fit$df.residual)
# and shape parameter is inverse of dispersion (https://stats.stackexchange.com/a/247631/45889)
shape <- 1 / dispersion
# scale = mu / shape (https://en.wikipedia.org/wiki/Gamma_distribution)
scale <- mu / shape

# variance = shape * scale^2 (https://en.wikipedia.org/wiki/Gamma_distribution)
var <- shape * scale^2

Is this correct? Because when I try to manually calculate Pearson residuals, my Pearson residuals differ from R's Pearson residuals by a factor of sqrt(shape).

# raw residuals
observed <- clotting$lot1
expected <- fit$fitted.values
raw_residuals <- observed - expected

# Pearson residuals = raw residuals / variance function (https://v8doc.sas.com/sashtml/insight/chap39/sect56.htm)
pearson_residuals <- raw_residuals / sqrt(var)

# but when I compare this to R's Pearson residuals, they're different
all.equal(resid(fit, 'pearson'), pearson_residuals)

# they're different by a factor of sqrt(shape)
pearson_residuals / resid(fit, 'pearson')
sqrt(shape)

var2 <- var * shape
pearson_residuals2 <- raw_residuals / sqrt(var2)
all.equal(resid(fit, 'pearson'), pearson_residuals2)
# TRUE

What am I missing? I have been assuming that I'm misunderstanding something with the parameterization of the gamma distribution, but perhaps am I doing something wrong with the Pearson residuals?

score 1 · Accepted Answer · answered Feb 14 '21 at 22:56

The Pearson residuals rescale using the variance function $V(\mu)$ but not using the estimated dispersion parameter. So if $\mathrm{var}[Y]=\sigma^2V(\mu)$, $$r_P = \frac{y-\mu}{\sqrt{V(\mu)}}$$ not $$r_P = \frac{y-\mu}{\hat\sigma\sqrt{V(\mu)}}$$

Obviously it doesn't matter for most graphical purposes whether the scale factor of $\hat\sigma$ is in there or not; it was at one point a free choice. The implementation in R followed that in S, which followed people who defined Pearson residuals without the $\hat\sigma$ term.

It's convenient to leave it out for a few reasons. First, you can define $\hat\sigma^2$ as just the variance of the Pearson residuals. Second, for linear/Gaussian glms you end up with the Pearson residuals being the same as the raw residuals (and the working residuals and the deviance residuals).

And since the variance function of the gamma distribution is `mu^2`, I divide the raw residuals by `mu` to get the Pearson residuals. — filups21, Feb 15 '21 at 00:31

Help understanding parameterization of gamma distribution in R's glm()

1 Answers1

Linked