Comparing linear mixed effects models using ANOVA - underlying assumptions

Question

I am trying to analyze some CPUE (Catch Per Unit Effort) data for a fisheries related analysis with lme models in R. So I have total of six models that are tested in groups of two to determine the effects of both fixed variables and their interaction separately. The models are:

M1 : CPUE = area + year + area*year + (1|location)

M2 : CPUE = area + year + (1|location)

M3 : CPUE = area + (1|location)

M4 : CPUE = ~1 +(1|location)

M5 : CPUE = year + (1|location)

M6 : CPUE = ~1 +(1|location)

The testing is done so that model 1 is tested against model 2 to test for interaction between the fixed variables, and models 3 vs 4 and models 5 vs 6 against each other for the effect of both fixed variables year and area.

The models are fitted with lme (from nlme -package) using 'ML' as method and the testing is done with anova.

So my questions are:

1) The CPUE data is not normally distributed. As I fit the models I see that the residuals also deviate from normal distribution, as do the plotted random effects. Do I need to transform my data to produce models with normally distributed residuals and random effects to use ANOVA for model comparison?

2) If I should transform my data can I be done simply by using for example log or sqrt if this seems to produce models with residuals and random effects that are approximately normally distributed? Does using a boxcox transformation fitted on a linear model without random effects produce erroneous results if used for lme models in this context? What I mean is using the estimate for lambda with the maximum log-likelihood obtained from the models without random effects to transform the data.

I am not the person who designed this analysis and have no resources or competence to change it. Therefore I am only interested in producing the results without making errors. Any help is appreciated!

Look into the literature. A quick look shows that GLMs with log-link and normal distribution have been used to model CPUE but I'm not sure if that is the best distribution since CPUE is an abundance measure, i.e., something similar to a count variable. — Roland, Dec 06 '18 at 14:11
Thanks for the input. You have some literature recommendations on GLM for an ecologist with very basic understanding in statistics? — mihy, Dec 06 '18 at 15:06

score 3 · Accepted Answer · edited Jun 11 '20 at 14:32

3

The assumption is that the residuals, not the actual data, are normally distributed. If the departure from normality is severe then a transformation may indeed make sense.
You can use Box-Cox transformations - the main disadvantage is that the model is harder to interpret than if you use a "simple" transformation such as the positive square root or a logarithm.

Your model selection procedure is a kind of backwards stepwise procedure. this is not a good idea. For example see here, here and here.

You may also wish to investigate generalised linear mixed models, depending on the nature of the outcome variable.

edited Jun 11 '20 at 14:32

Community

1

answered Dec 06 '18 at 14:17

Robert Long

53,316
10
84
148

Thank you for your response! Yes, the residuals being non-normally distributed led me to this question, not the actual data. The only thing this model testing is used for is to see if models with or without fit the data significantly better f.ex. does a model with area as fixed fit the data significantly better than a model without it and so on. I am sure to pass the critique to my superiors to hire an actual statistician for a new analysis design (: – mihy Dec 06 '18 at 15:04
1

@mihy You asked for a reference for ecologists above. Try [this](https://www.amazon.co.uk/Effects-Extensions-Ecology-Statistics-Biology/dp/0387874577) , [this](http://www.highstat.com/index.php/beginner-s-guide-to-regression-models-with-spatial-and-temporal-correlation) and [this](https://ms.mcmaster.ca/~bolker/emdbook/book.pdf) – Robert Long Dec 06 '18 at 16:03

Comparing linear mixed effects models using ANOVA - underlying assumptions

1 Answers1