Choosing the best family and link function for a GLM

Question

I'm new to GLMs and I need to do several GLM using a kilometric abundance index (non integer, but calculated from a count) as the dependent variable and several habitat traits as the independent ones.

I was told to try to do it with family=poisson(link="log") (although its not an integer) and with family=gaussian(link="identity") by different people. I tried to do both, but now I don't know how to choose the best one for each case. I printed the QQPlots of the residuals and the residual vs fitted, and I can only see very small differences. Are there more effective (or less subjective) methods to determine which GLM fits better in each case?

This was discussed earlier see: https://stats.stackexchange.com/questions/530605/decide-which-distribution-function-to-use-in-glm-for-a-complex-response-variable/530695#530695 — msuzen, Aug 08 '21 at 18:44
can you be more specific about the 'calculated from a count' part? — Glen_b, Aug 08 '21 at 22:38

score 1 · Answer 1 · answered Aug 08 '21 at 19:03

You don't want to divide through to turn your abundance data into non-integer data. Instead, you should use the denominator (e.g., time or area) as an offset (see my answer here: How to deal with "non-integer" warning from negative binomial GLM?). If you prefer, you can also use the denominator as a covariate (see my answer here: In a Poisson model, what is the difference between using time as a covariate or an offset?).

Setting that aside, your base data are counts, so I would use a distribution for counts a-priori. Using the Poisson distribution is fine, but it makes a very restrictive assumption that the conditional variance is the same as the conditional mean. That's often a dicey assumption. So it may be better to use the quasi-Poisson distribution, or to fit a negative binomial model, instead.

Choosing the best family and link function for a GLM

1 Answers1