Repeated measure block design got significant block and block interactions. Each block analysis showed 2 out of 5 blocks, not normally distributed

Question

I have Per Capita fecundity of females from two population of Drosophila (evolved and ancestral-Population type) females at 5 different age points (age fixed factor), where same females were used for fecundity measurement. My unit of analysis for per capita fecundity at each age point(1,5,10,15 and20) comes by-counting the number of eggs laid by group of 10 females divided by number of females alive at the start of that day point. So basically its a fraction type data. This experiment was carried out with 5 independent replicate population of the two population type. Thus represents repeated measure (female fecundity measured at different age points)block design with 5 statistical blocks.

However, I got significant block effect and block interactions with other fixed effects, when I ran LMM under lme4 package (lmer) taking block as random factor. So we analysed each block seperately,therefore first checked for normality distribution of each block, 2 out of 5 blocks were not normally distributed (residual distribution was checked S-W test). Here's the qq-plot from one of those blocks (Block 1-W = 0.97244, p-value = 0.005062):

Other blocks showed better qq-plots, although the Shapiro-Wilk test still suggested non-normality (Block 5-W = 0.9795, p-value = 0.02518):

So whether can i still go for parametric test with this much deviation from normality can be accepted or should i go for non-parametric test for these 2 blocks. I thought of doing GLMM (glmer) but I am not aware of what distribution would fit my data type? Is it Poisson or quasipoisson or Gamma?

Could you show us some plots (or even share data)? You should only do normality testing once, for residuals, see https://stats.stackexchange.com/questions/224673/difference-between-normality-of-residuals-vs-normality-in-each-group (if at all), also see https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless — kjetil b halvorsen, Jun 14 '21 at 22:56
@kjetilbhalvorsen. Thanks for your reply. I have done the normality testing of residual once only for each block (Each block here is one population, since I have 5 replicates each of evolved and ancestral population). Can you specify which plot should I share the normality distribution plot or mean and SE plot for the trait? Meanwhile i am sharing the Shaprio-Wilk result- Block 1-W = 0.97244, p-value = 0.005062 , Block 5-W = 0.9795, p-value = 0.02518 (these two where p value was siginificant) for other 3 blocks it was not. — Tanya Verma, Jun 16 '21 at 05:20
@kjetilbhalvorsen i have put the qqplots of both the blocks which were not found to be normally distributed. Please guide further. — Tanya Verma, Jun 23 '21 at 05:19
Please edit the question to say more about the nature of the original data used to get your measure of fecundity. Was it number of progeny, a count? If so, what were typical values? The nature of the original data (count, binomial outcome, etc) is a major factor in determining the correct distribution family for a generalized linear model. More information about the overall experimental design and your call to `lmer` would also help. Please provide that information by editing your question, as comments are easy to overlook and can get deleted. — EdM, Jun 23 '21 at 13:04
@EdM Yeah what you did is correct. Thanks for your generous help. — Tanya Verma, Jun 24 '21 at 08:53

EdM · Answer 1 · 2021-06-25T02:10:02.740

With positive count data in both the numerator and the denominator of your fecundity measure, it might not be surprising that residuals don't follow a normal distribution. Poisson count data have a variance equal to the mean, necessarily smaller in magnitude at low counts and larger at high counts. That might explain the heavy-tailed nature of your qq plots. Here's a qq plot for a lm() fit of 450 Poisson-distributed Y values versus corresponding X mean values ranging from 1 to 15; code below. It has the same overall shape as yours.

Even if you eventually present the results in terms of that measure, your statistical analysis might best be done directly at the count level. That means starting with a Poisson generalized linear model (log link) and working from there, perhaps moving to a quasi-Poisson or negative binomial model.

You would model the actual egg counts, using the log of the number of females as an offset for this rate-type analysis. Time and population type would still be fixed effects (with their interaction, which seems to be of interest), and the "blocks" (as I understand, 10 total, 5 for each population type) treated as random effects. The R DHARMa package provides useful tools for residual diagnostics.

One potential problem in the design: this assumes that per-capita fecundity is independent of the number of females alive at the start of each time point. If crowding affects fertility, then neither your fecundity index nor using the log of the number of females as an offset (fixed regression coefficient of 1) would be valid. Check that mortality is the same for the 2 population types at a minimum, and see if there's evidence of non-proportionality of egg counts against number of females under otherwise similar circumstances.

Code for the plot

> myX <- rep(1:15,30)
> length(myX)
[1] 450
> set.seed(1234)
> myY <- rpois(450,myX)
> poisDF <- data.frame(x=myX,y=myY)
> plot(lm(y~x,poisDF))

One thing i want to clear from your reply is i got normal distribution for 3 blocks(when i say 3 blocks its same replicate population of both evolved and ancestral population forms one block i.e. EvolPop1 is derived from AncesPop1 and both forms one block of experiment). So i have only 2 block not normally distributed. so should i carry this analysis for those two blocks or all 5 blocks? Other thing is per capita fecundity is not independent of no. of females alive. It is calculated by dividing the total eggs laid by females by number of females alive at that age point — Tanya Verma, Jun 25 '21 at 04:20
With high counts/small range, Poisson-distributed values come closer to normal. Analyze counts for all 5 blocks (thanks for the clarification) in a single model, with blocks probably treated as random effects. That gets the most information out of your data. I understand that per capita fertility divides total eggs by starting females, but I worry about the following: could effects of crowding (competition for resources, higher pheromone levels, etc) affect the number of eggs _each_ female produces on average? If 2 females in a jar lay 20 eggs, will 10 in the same size jar necessarily lay 100? — EdM, Jun 25 '21 at 12:55

Repeated measure block design got significant block and block interactions. Each block analysis showed 2 out of 5 blocks, not normally distributed

1 Answers1