GLM vs square root data transformation

Question

I am currently analysing some pretty awful/awkward data on the abundance of fish under three different "Hydro-Regimes" (5 abundance measurements for each regime - Short/Medium/Long). The current analysis plan had been a one way ANOVA.

Looking at the residuals vs. fitted and the normal Q-Q after plugging it into a linear models indicates right skew. To correct this, I tried to Sq. Rt transform the outcome variable, which gives a reasonable answer, with "short" being significantly different to "long" and "medium" after a post-hoc Tukey. However, plotting the data shows the error bars of "long" and "medium" not to overlap.

I also looked at using a Poisson GLM, which I have read is good for count and skewed data, but I am not sure if this is the right way to go. Any suggestions?

I can't tell from your question, are your measurements w/i each regime independent? — gung - Reinstate Monica, Feb 02 '13 at 19:33
Well I didn't collect the data myself, but each hydroperiod is the percentage of time a site is inundated with water, and a count of fish abundance was made during each. Logically, I would think that this was have all occured at one site during different flooding events. — user2037072, Feb 03 '13 at 15:38

AdamO · Answer 1 · 2013-02-02T18:29:13.180

5

A Poisson model would most definitely be a sensible way to do this analysis.

Traditionally (before Poisson GLMs were available), such data were analyzed using square root transformations as a "variance stabilizing" transformation (i.e. so that the variance would be independent of the mean). The problem is that, when you transform the data, it becomes difficult to interpret the model coefficients. With a square root transformation of the data, the parameters estimate a difference in square roots of counts.

A Poisson GLM allows you to exploit the mean-variance relationship in count data to get better inference. The parameters estimate ratios of rates between the various treatment levels. And, due to the small sample size, it is a parametric modelling approach with reasonable assumptions that will give you relatively efficient inference.

edited Feb 02 '13 at 18:29

answered Feb 02 '13 at 17:30

AdamO

52,330
5
104
209

1

This was really helpfull, thankyou. Running the poisson GLM and a post-hoc tukey revealed significant differences between all three hydro-reigemes. However, I can't seem to get any overall significance value with anova() or summary(), and I think there is some serious over-disperson in the data - Residual deviance: 1498.8 on 13 degrees of freedom. As such, I tried a Quasi-Poisson, but the post-hoc tukey then gave a very similar result to the Square Root Transformation. I am so confused! Help me AdamO, you are my only hope! – user2037072 Feb 02 '13 at 18:17
2

Quasi-Poisson accounts for overdispersed data (the type of data that arises from slight unmeasured correlation between observational groups. You'd need a justification to motivate using quasi). Let's table that idea for now. Your syntax suggests you're using R. Import the package `lmtest`. You can perform the likelihood ratio test of the null hypothesis that all regimes have equal abundance by typing `lrtest(glm(abund ~ regime, family=poisson))`. – AdamO Feb 02 '13 at 18:25
1

lmtest is ace. Worked great, thanks AdamO! Out of interest, I did an AIC on the two models (SQRT transformed and glm)to compare how they fit the data, with these results: > AIC( fish.glm) [1] 1630.778 > AIC( lm.sqrt.fish ) [1] 104.5409 – user2037072 Feb 03 '13 at 15:41

GLM vs square root data transformation

1 Answers1

Linked