1

I am struggling with using the correct test for count data in R. The dataset I have is the number of nymphs produced by three aphid species on wheat.

So this is count data with one response variable (number of nymphs) and one explanatory variable with three levels (i.e. three different species).

I want to test whether there are any differences between the three species with the number of nymphs they produce.

I think that I should not use an ANOVA with count data. I have tried a GLM with QuasiPoisson, but I am struggling to interpret the output and be sure that I have used the correct test. Essentially I want to equivalent for a one-way anova but for count data.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Sitobion
  • 51
  • 1
  • 4
  • 1
    Do you have just 3 counts, or do you have 3 *sets* of multiple counts? If you are having trouble interpreting the output from a GLM, it might help you to read: [How to interpret coefficients in a Poisson regression?](http://stats.stackexchange.com/q/11096/) – gung - Reinstate Monica Feb 07 '16 at 15:42
  • Hi. I am having trouble adding a table to the posts to show what the data is. I have sixty individual data points, twenty for each of three species. I want to analyse any differences in the mean count between the three species. I have not seen any examples of Poisson being used where there is only one explanatory variable, which is why i am uncertain. – Sitobion Feb 08 '16 at 18:14

3 Answers3

3

I suggest two consideration before modelling.

  1. Are you or not comparing fecundities of species through their lifetime? Poisson distribution might not be the case here if you do compare in this way, because species might have different lifespan on the same host plant, Poisson however refer to occurrence of events at the same time interval. Non-parametric comparison can be used here but please try to have enough replicates because generally non-parametric methods are more conservative.

  2. If hypothesis on Poisson is met here, it will be better to check whether, overall, the average value of response variable equal to their dispersion. You might consider to use "Negative binomial regression" if your data is over-dispersed.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Fumin Wang
  • 31
  • 3
1

I would suggest a Kruskal-Wallis test followed by Dunn Test for multiple comparisons:

library(dunn.test) #available on CRAN

## count data
df <- data.frame(group=LETTERS[1:3], count = c(1,3,5,4,4,6,1,3,5,2,3,5,1,5,3,4,2,5))

kruskal.test(count~group, data=df)

#Kruskal-Wallis rank sum test
#
#data:  count by group
#Kruskal-Wallis chi-squared = 8.7041, df = 2, p-value = 0.01288






# non-formula usage (default "holm" method)
dunnTest(df$count,df$group)

#Dunn (1964) Kruskal-Wallis multiple comparison
#  p-values adjusted with the Holm method.
#
#  Comparison         Z     P.unadj      P.adj
#1      A - B -1.131517 0.257837400 0.25783740
#2      A - C -2.925386 0.003440288 0.01032086
#3      B - C -1.793869 0.072834082 0.14566816
jalapic
  • 359
  • 2
  • 10
  • The dunnTest function is found in the FSA package. This is a good implementation to use. One, I like the format of the output, and 2) some of the other implementations in R report one-sided p-values, which you don't want by default. – Sal Mangiafico Sep 16 '17 at 18:19
0

I like your original approach. There is a worked example here. I have never done Poisson regression, but that seems to me what you have here, where number of nymphs is your Poisson distributed y and species is your factor variable x.

glm(y~x, family="poisson")

You would interpret the output similarly to how you would interpret an ordinary ANOVA output from R.

stan
  • 140
  • 10
  • Would Poisson be appropriate where there is only one explanatory variable? I have only seen examples where there are multiple explanatory variables. – Sitobion Feb 08 '16 at 18:16
  • Yes, of course one variable is fine. If your y variable was normal, in R you would do `lm(y~x)`. For a normal y and one categorical x, the usual name is one-way ANOVA. BTW you could also try quasi-poisson, as you did at the start. It deals with 'over dispersion'. But I wouldn't lose sleep about it. – stan Feb 08 '16 at 18:42
  • Sometimes people get confused with the `summary` output in R. With `glm` objects, it is sometimes helpful to use the `Anova` function in the `car` package to get an anova table, and the `lsmeans` package for post-hoc comparisons. – Sal Mangiafico Sep 16 '17 at 18:24