4

What are the assumptions when doing hypothesis testing using a Gamma GLM or GLMM? Are the residuals suppose to be normally distributed and is heteroscedasticity a concern like the Gaussian (normal) distribution? Do you test the assumptions in the same fashion Levene Tests of individual fixed effects, residual and qqplots, and Shapiro-Wilks test?

Discussion of when to use Gamma Glm and Glmm is found here:

When to use gamma GLMs?

However, I found little to no discussion of the assumptions and how to test them?

Edit 1: Requested Example residual plots- are any of these a concern. Levene's test, comes back significant.

Residuals are plotted using this code. Time (Obs) is the ordinal factor levels from the data

modelresiduals = residuals(Gama_model)
plot(modelresiduals)
plot(y = modelresiduals, x = Model_data2$Obs)

1. All resids-

enter image description here

2. By time

enter image description here

Edit 2 Based on comment suggestions- Patterns look good

ypred = predict(gama_model)
res = residuals(gama_model, type = 'pearson')
plot(ypred,res)
hist(res)
plot(res)
plot(y = res, x = Model_data$Obs)

3. Pearson histograms- based on answer code enter image description here

4. Pearson vs pred (link scale)- based on answer code enter image description here

5. All Pearson residual- based on answer code enter image description here

6. Pearson residual by Time- based on answer code enter image description here

OliverFishCode
  • 414
  • 6
  • 18

3 Answers3

3

A nice approach for checking the fit of your assumed model to the data, accounting for features, such as, over-dispersion, non-normality, zero-inflation is the simulated scaled residuals provided by the DHARMa package. If your assumed model is correct, these residuals should have a uniform distribution. You can find more details on the procedure they are defined and used in the vignette of the package.

As an example, using the simulated example above, I compare below the fit of the Gamma model to the fit of the wrong normal model:

set.seed(0)
N <- 250
x <- runif(N, -1, 1)
a <- 0.5
b <- 1.2
y_true <- exp(a + b * x)
shape <- 2
y <- rgamma(N, scale = y_true / shape, shape = shape)

gamma_model <- glm(y ~ x, family = Gamma(link = "log"))
normal_model <- glm(y ~ x, family = gaussian())

library("DHARMa")
check_gamma_model <- simulateResiduals(fittedModel = gamma_model, n = 500)
plot(check_gamma_model)

check_normal_model <- simulateResiduals(fittedModel = normal_model, n = 500)
plot(check_normal_model)

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
  • note if using glmmtmb for mixed models...random effects can make residual vs predicted line plots go haywire....despite good fit and lack of overdispersion..thanks for addition – OliverFishCode Jan 31 '19 at 19:59
  • 1
    In my **GLMMadaptive** package (https://drizopoulos.github.io/GLMMadaptive/) that fits similar models as the **glmmTMB** but using the adaptive Gaussian quadrature instead of the Laplace approximation it works. For an example you may check: https://drizopoulos.github.io/GLMMadaptive/articles/Goodness_of_Fit.html – Dimitris Rizopoulos Jan 31 '19 at 20:01
2

Are the residuals suppose to be normally distributed...

No. The distribution of $y\vert x$ is assumed to be gamma, and I don't think that $X - \mathbb{E}(X)$ is gamma for $X\sim \text{Gamma}(k,\theta)$. The deviance residuals, on the other hand, should be normal.

is heteroscedasticity a concern like the Gaussian (normal) distribution?

Heteroscedasticity should be observed since the variance changes with the mean.

Here is an example of Gamma regression in R.

library(tidyverse)

#Simulate
set.seed(0)
N <- 250
x <- runif(N, -1, 1)
a <- 0.5
b <- 1.2
y_true <- exp(a + b * x)
shape <- 2
y <- rgamma(N, scale = y_true/shape, shape = shape)

#Model
plot(x,y)
model <- glm(y ~ x, family = Gamma(link = "log"))
summary(model)

#Deviance residuals
ypred = predict(model)
res = residuals(model, type = 'deviance')
plot(ypred,res)
hist(res)

#Deviance GOF

deviance = model$deviance
p.value = pchisq(deviance, df = model$df.residual, lower.tail = F)
# Fail to reject; looks like a good fit.

For more on GLMs, I would recommend Extending the Linear Model by Faraway.

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
  • so am I correct in saying that if my data is hetersocedastic in time (ie. Cone shaped), if I want to do hypothesis tests with pairwise comparisons and I'm doing a gamma glmm, it's not a concern like it would be in a linear mixed model(Gaussian)? – OliverFishCode Jan 31 '19 at 01:23
  • That should be correct. The variance should change with the mean for a Gamma Glm – Demetri Pananos Jan 31 '19 at 01:37
  • 1
    "Heteroscedasticity should be observed since the variance changes with the mean." This is true for the modeled **response** variable but not for the residuals vs. the predicted values as part of the model validation process? Your `plot(ypred,res)` looks good and there is no residual pattern present. – Stefan Jan 31 '19 at 02:44
  • 1
    @Stefan, yes, thank you for clarifying. The response should be heteroskedastic, but the deviance residuals should not. – Demetri Pananos Jan 31 '19 at 02:45
  • OK! I wasn't exactly sure what @DevonOliver meant (modeled response or residual) when he said "so am I correct in saying that if my data is hetersocedastic in time...it's not a concern..." – Stefan Jan 31 '19 at 02:48
  • @DemetriPananos if a levennes test of the residuals(without specifying deviance) for one of thefixed effects (time; since thats the example ive been using)came back significant, is that a concern . – OliverFishCode Jan 31 '19 at 03:01
  • It isn't normally recommended to use those sorts of tests for model diagnostics. Can you plot and post your residuals? – Demetri Pananos Jan 31 '19 at 03:02
  • 1
    I agree with Demetri. You shouldn't use those tests to assess whether assumptions are met. I mostly use Pearson residuals and not Deviance residuals for model validation - although both (Pearson and Deviance residuals) are defined in Faraway's book to do diagnostic plots. This paper might be helpful too (see Step 7): https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12577 – Stefan Jan 31 '19 at 03:10
  • @DemetriPananos added requested plots – OliverFishCode Jan 31 '19 at 03:31
  • @DevonOliver Tough for me to say without any context for the problem – Demetri Pananos Jan 31 '19 at 03:32
  • @DemetriPanano In general, I should plot deviance residuals against preds and if there is no pattern, I should consider it good to go? Other than the Chi-sq test you listed is there any formal test? – OliverFishCode Jan 31 '19 at 03:36
  • @DevonOliver Doesn't seem to be a pattern, but there is something up with the residuals. That test is called a "Deviance goodness of fit test" – Demetri Pananos Jan 31 '19 at 03:37
  • @DevonOliver I used a regular GLM. If you are using a mixed model, they are likely evaluated differently. I would suggest you talk to a statistician at your institution about this. I cant help without understanding what we are modelling – Demetri Pananos Jan 31 '19 at 04:01
  • This really helped my understanding; I think I got in my own head and went down a wormhole. I generally live in the Poisson (Neg. Bin) and Binomial world or have much cleaner data. I will be sticking to Pearson and deviance resid plots going forward. Thanks for advice – OliverFishCode Jan 31 '19 at 04:32
1

For checking the deviance goodness-of-fit statistic for a Generalized Linear Model like a Gamma Regression Model I recommend to follow the instructions on slide 18/19 of Technical University Munich respectively on slide 104 of Technical University Graz.

That is, if dispersion parameter disp can not be assumed to be '1', the residual deviance test, i.e. calculating the p-value, should be applied on the scaled deviance, since the scaled statistic follows the chi square distribution (as outlined in the cited slides), e.g.:

p.value = pchisq(myModel$deviance/summary(myModel$disp), df = myModel$df.residual, lower.tail = F)