0

I have two groups of species: the groups have different numbers of species.

For each species, a different number of individuals have been screened for a gene.

I want to describe the difference for the gene between the two groups.

I did a Welch's t-test, as this deals with the different numbers of species within the groups, but it is the different number of individuals of each species that concerns me.

Any tips or ideas on what I can do to overcome this pooling issue?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156

2 Answers2

2

It seems that perhaps you want a model that better captures the stratification in your data. You might want a multi-level model. This would capture that the gene is not only between these two groups of species but that there are different species in each group. An example of such a model expressed in R's lmer function would be:

lmer( presence ~ group + (1|species), family = 'binomial' )

In this case you're looking for a fixed effect of group while recognizing that there is an additional random effect of species.

(BTW, you were testing a proportion with a t-test, which is a no-no for many reasons including, you know the variances are unequal, the data may not be normally distributed, and there are better solutions that correctly handle binomial data like logistic regression, binomial tests, binomial confidence intervals.)

John
  • 21,167
  • 9
  • 48
  • 84
0

I recommend you this page which already discussed the argument giving a clear explanation How should one interpret the comparison of means from different sample sizes?. The only thing I would pay attention to is that different species within the same group do not have effect in your study. Hope this helps.

RDGuida
  • 203
  • 1
  • 3
  • 7