1

I'm new to GLMs and I need to do several GLM using a kilometric abundance index (non integer, but calculated from a count) as the dependent variable and several habitat traits as the independent ones.

I was told to try to do it with family=poisson(link="log") (although its not an integer) and with family=gaussian(link="identity") by different people. I tried to do both, but now I don't know how to choose the best one for each case. I printed the QQPlots of the residuals and the residual vs fitted, and I can only see very small differences. Are there more effective (or less subjective) methods to determine which GLM fits better in each case?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
arnausc
  • 11
  • 1
  • 1
    This was discussed earlier see: https://stats.stackexchange.com/questions/530605/decide-which-distribution-function-to-use-in-glm-for-a-complex-response-variable/530695#530695 – msuzen Aug 08 '21 at 18:44
  • 1
    can you be more specific about the 'calculated from a count' part? – Glen_b Aug 08 '21 at 22:38

1 Answers1

1

You don't want to divide through to turn your abundance data into non-integer data. Instead, you should use the denominator (e.g., time or area) as an offset (see my answer here: How to deal with "non-integer" warning from negative binomial GLM?). If you prefer, you can also use the denominator as a covariate (see my answer here: In a Poisson model, what is the difference between using time as a covariate or an offset?).

Setting that aside, your base data are counts, so I would use a distribution for counts a-priori. Using the Poisson distribution is fine, but it makes a very restrictive assumption that the conditional variance is the same as the conditional mean. That's often a dicey assumption. So it may be better to use the quasi-Poisson distribution, or to fit a negative binomial model, instead.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650