What methods are appropriate for testing correlations between binomial count (but not presence/absence) data?

Question

Is it appropriate to analyze the relationship between a proportional explanatory variable and a proportional response variable (as cbind(option 1, option 2), or as a proportion with weights) using logistic regression? If so, is there a way to weight the proportional explanatory variable to account for different sample sizes for each trial?

Here are the details of my study:

I would like to evaluate whether individual insect preferences for pairs of host plants are correlated. Insect preference was tested using three separate choice assays:

1: plant A vs. plant B, 2: plant A vs. plant C, and 3: plant A vs. plant D.

In each assay, we recorded the number of eggs laid on the two available plants. Each individual was tested on all three assays. Assay 1 was diagnostic, so all insects started on that assay, after which they were moved onto assays 2 and 3 in random order. A total of 79 individuals laid eggs in all three assays. The total number of eggs laid in assay 1 ranged from 2 to 126; in assays 2 and 3, from 2 eggs to 178.

My main questions are, is preference (proportion of eggs laid on plant A) in assay 1 correlated with preference in assay 2 and/or with preference in assay 3?

Possible solutions (and associated concerns):

Use logistic regression with proportion of eggs laid on plant A in assay 1 as an explanatory variable, including only those females that meet a minimum threshold for total eggs laid in assay 1 (>15). This both reduces my total sample size (down to n = 53), and does not take into account the greater confidence I have in the preference of an insect that lays 30/75 eggs on plant A in assay 1 compared to an insect that lays 6/15 eggs on plant A in assay 1.
Use a negative binomial mixed model (glmer.nb) with the rough form: plant A eggs ~ log(total eggs) * Assay + (1|insect ID). While this can tell me whether the eggs laid on plant A as a function of total eggs laid differs between the three assays, I don't believe it is informative about how similar the preferences are. I also looked at poisson and quasipoisson, but these models had worse fits than the negative binomial model.

Welcome to the site, @rstewa03. Please could you elaborate a little on your concern about the sample sizes - what were your sample sizes for each assay? Were different insects used for each assay? — Izy, Jun 03 '19 at 22:38
Thank you! Each individual was tested on all three assays: assay 1 was diagnostic, so all insects started on that assay, after which they were moved onto assays 2 and 3 in random order. A total of 79 individuals laid eggs in all three assays. The total number of eggs laid in assay 1 ranged from 2 to 126; in assays 2 and 3, from 2 eggs to 178. — rstewa03, Jun 04 '19 at 14:09
Thanks, that's clearer. I think that structure should be reflected in your analysis, so suggest you edit your question to include that information. Hopefully someone will have a good suggestion for how to analyse this sort of data. — Izy, Jun 04 '19 at 14:29
You could also search for previously published papers that had this kind of experimental structure and see what methods they used. If you find a good answer, please come back and explain it here (you are allowed to answer your own question). — Izy, Jun 04 '19 at 14:34
Thanks @Izy, I will continue to look for similar experimental designs described in the literature and on different forums. — rstewa03, Jun 05 '19 at 20:16
Can you explain why you have greater confidence in the preference of females that have laid more eggs? Is this from biological knowledge, or do you mean as a statistical concept? — Izy, Jun 05 '19 at 20:41
I wonder if you might want to look into using Poisson regression (e.g. using glm in R) - as you have counts data. Have you considered it? — Izy, Jun 05 '19 at 20:43
I mean confidence as a statistical concept, considering each egg laid as an event and the total number of eggs laid as the sample size, the binomial confidence interval narrows as the number of eggs laid increases. This doesn't completely align with the biology, because the egg-laying events are not completely independent. — rstewa03, Jun 06 '19 at 15:03
The data are overdispersed, so a poisson regression fit very poorly. Instead, I have tried using a negative binomial mixed model (glmer.nb) with the rough form: plant A eggs ~ log(total eggs) * Assay + (1|insect ID). While this can tell me whether the eggs laid on plant A as a function of total eggs laid differs between the three assays, I don't believe it is informative about how similar the preferences are. — rstewa03, Jun 06 '19 at 15:12

score 1 · Answer 1 · answered Jun 09 '19 at 15:06

1

One approach could be to treat individual eggs, instead of entire "clutches", as your units of observation, with the dependent variable coding for whether an egg has been laid on plant A ("success" = 1) or not (0). Such data is appropriately modelled by Binomial logistic regression. In addition, it will reflect the fact that you have intuitively more confidence in data coming from bigger clutches, because there will be as many data points as eggs per insect and assay.

Regarding the correlation between insect preferences across assays, it is reflected by the variance of the random intercept for insects. The role of the random intercept is to model the lack of independence between observations (eggs in assays) in a level of the grouping factor (individual insects), i.e. their general idiosyncratic preference for, or reluctance towards, plant A across assays.

answered Jun 09 '19 at 15:06

Ous

448
2
6

+1, good suggestions. This (and @rstewa03 's suggestions too I believe) rely on the simplifying assumption that each egg-laying event is independent. Any thoughts on how/whether to reflect the ordering of the assays in the analysis? – Izy Jun 10 '19 at 11:34
@Izy Actually the random intercept per insect models the non-independence of individual eggs' probability to be laid on one plant or another. Are you referring to another kind of non-independence that I have overlooked? – Ous Jun 10 '19 at 13:11
Regarding ordering, I would not bother with it as long as the 6 different possible orderings were balanced across the sample. One could model it as a random effect (because Insects are nested within Ordering) but it probably won't be helpful given the small number of levels of Ordering (6). If Ordering is an effect of interest, then it could be modelled as a between-group fixed effect. – Ous Jun 10 '19 at 13:17
I was not referring to the random effect of 'individual', but rather that eggs may be laid in clumps (i.e. all at once). For example, a female laying 20 eggs in clumps of 5 might be said to have made four 'decisions' on where to lay her eggs. Additionally, there may be some auto-correlation with the previous decision, e.g. if she only moves a certain distance before her next egg-laying decision. I expect it wouldn't be easy to measure this, and the assumption that individual eggs are independent events (as a measure of an individual's preference) may be acceptable for this case. – Izy Jun 10 '19 at 14:27
1

I think it would be sensible to consider ordering either as a random or fixed effect - the fact that it has been included in the experimental design suggests that it is considered as something that may be an issue. I don't think the complete factorial of ordering was carried out though - it was either 1-2-3 or 1-3-2. As you note, individual is nested within order, so I think the random intercept would be (1|Ordering/Individual) in R (glm) parlance? – Izy Jun 10 '19 at 14:37
1

Order is of considerable interest, and I have included it in my models thus far as a fixed effect with two groups (as @Izy noted). – rstewa03 Jun 12 '19 at 15:55
@rstewa03, does Ous' response answer your question? If so, consider marking it as accepted (if not, can you explain why?). – Izy Jun 12 '19 at 17:27
@Ous, can you advise how to interpret the variance of the random intercept for insect ID so that it is directly informative about my original question? Are you suggesting that I can use this to understand the proportion of variance explained by the random effects? Such methods exist for the the fixed effects and both the fixed and random effects (as described in https://stats.stackexchange.com/questions/111150/; MuMin package, Nakagawa & Schielzeth 2013). Ben Bolker suggested these can be extended to random effects (https://stats.stackexchange.com/questions/153611/), but I'm not sure how. – rstewa03 Jun 12 '19 at 21:56
@rstewa03, it might be worth turning your question about how to extend the methods to random effects into a separate question. – Izy Jun 13 '19 at 13:03
1

@Izy To add Odering as a random effect with Individuals nested within, (1|Ordering/Individual) is correct indeed. It expands to (1|Ordering) + (1|Ordering:Individual). The latter implies that one variance of Individual intercepts will be estimated separately for each Ordering. – Ous Jun 13 '19 at 21:06
1

@rstewa03 The variance of the random intercept of ID is not directly link to variance explained, and I would not know how to get that. If you only need to convince yourself (or someone else) that preferences are correlated across assays, you can qualitatively compare the value of the variance to the estimate of the Assay fixed effect, just like Ben Bolker suggested in your second link. If you need numbers and a more rigorous criterion, Bayesian estimation of lmm provide credible intervals (the Bayesian equivalent of confidence intervals) for random parameters. – Ous Jun 13 '19 at 21:13

What methods are appropriate for testing correlations between binomial count (but not presence/absence) data?

1 Answers1