7

I ran a regression on R and my shapiro wilk test showed that some of my residuals are not normally dsitributed. I cannot transform the data to fit a normal distribution and even when i remove outliers, my data still do not conform? I think this is because there are a lot of zeros and then occasional large numbers which were schools of fish i counted which means my data is quite bizarre. Would love any suggestions.

My response variables are fish density and species richness My predictor variables are all categories, depth (5, 10 and 15m) and site (1, 2 3) and sample method (1, 2).

I also considered doing a wilcoxson signed rank to compare density and richness but I can only do this between the two sites as my other predictors have 3 levels.

Thanks.

enter image description here

mdewey
  • 16,541
  • 22
  • 30
  • 57
Vivienne
  • 441
  • 1
  • 4
  • 8
  • 5
    If your outcome variable is a count then you could use a Poisson (or negative binomial) model, possibly with zero-inflation. – mdewey Sep 03 '16 at 13:28
  • It might help if you could show a plot of residuals against predicted values for each of your response variables. Also, please describe how these response variables are determined, in particular whether they are themselves raw observations or instead are some type of ratios or other functions of the raw observations. – EdM Sep 03 '16 at 14:11
  • Hi, my response variables were counts for but converted into fish density, i.e the count divided by the area sampled so they are not count data. Species richness is literally the different number of species seen on each sample. – Vivienne Sep 04 '16 at 02:46
  • Please see original comment for residuals. I believe this is because many of my fish counts are zeros apart from schools of fish which are the outliers. – Vivienne Sep 04 '16 at 02:50
  • 3
    It sounds like you might need some kind of hurdle or two-stage model for your data. That way you can model the appearance of the school of fish first, and then model fish counts conditional on whether a school is present or not. – shadowtalker Sep 04 '16 at 04:10
  • 3
    The residuals here cannot be normal; this whole model is incorrect. You should not use the densities (counts divided by area). See my [answer](http://stats.stackexchange.com/a/232872/7290) to your previous question. – gung - Reinstate Monica Sep 05 '16 at 14:47
  • 3
    With count data as primary observations, you are much better off using Poisson or similar modeling of the counts as the dependent variable, taking area into account as an offset or covariate, as recommended in answers to [your previous question](http://stats.stackexchange.com/q/232666/28500). It seems that you have 4 fish groups, perhaps some with schooling behavior (s,e) and others without (g,t). Those types of fish might need different models, as suggested by @ssdecontrol. – EdM Sep 05 '16 at 17:29
  • At the risk of sounding glib, the answer to this question is basically "use a better model." – shadowtalker Sep 05 '16 at 17:34
  • 1
    One useful source of information on modeling count data is [Regression Models for Count Data in R](https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf), a vignette in the [`pscl` package](https://cran.r-project.org/web/packages/pscl/). The CRAN task view on [Analysis of Ecological and Envirnomental Data](https://cran.r-project.org/web/views/Environmetrics.html) might also be helpful for your application. – EdM Sep 05 '16 at 19:03

1 Answers1

-2

2 paths :

  • Zero Inflated Models

  • Non-parametric ANOVA , Kruskal Wallis based on ranks

In his place, I would run a Krukal Wallis anova on counts ~ categories that doesnt require the two anova assumptions normality of residuals and heteroscédasticity. Furthermore, if I want a regression model, I would use a Zero Inflated Poisson because its an ecological count data (sparsity) , and finally, if more than that, the data is overdispersed, I would look at the Zero Inflated Binomial model. In anyway I would compare the ZIP and the ZINB based on Log Likelihood (AIC, BIC too...)

Mr Micro
  • 27
  • 2
  • This needs much expansion to qualify as an answer. – mdewey Apr 07 '17 at 13:04
  • This is being automatically flagged as low quality, probably because it is so short. At present it is more of a comment than an answer by our standards. Can you expand on it? We can also turn it into a comment. – gung - Reinstate Monica Apr 07 '17 at 13:08
  • Turn it into a comment in any way, gung. – Mr Micro Apr 07 '17 at 13:16
  • Lol that is too much constraints for a knowledge community forum, Im just a visitor, maybe potential new member, and already have my little (very negative) idea. Just to recall I've been against 3 or 4 constraints in less than an hour. this is science, not black jack. – Mr Micro Apr 07 '17 at 13:33
  • Mr Micro, I appreciate the insight that underlies your post. I believe the reason you are getting these comments (and, unfortunately, downvotes) might lie in the telegraphic nature of your reply. I suspect only someone who is already an expert would be able to interpret your advice as intended and act on it effectively. Others might only be confused or misinterpret it altogether. Another concern is that you supply minimal justification for your recommendations: they appear primarily to rely on your authority ("I would do this and that"). Science depends on rational argument instead. – whuber Apr 07 '17 at 14:50