16

I have a dataset comprised of proportions that measure "activity level" of individual tadpoles, therefore making the values bound between 0 and 1. This data was collected by counting the number of times the individual moved within a certain time interval (1 for movement, 0 for no movement), and then averaged to create one value per individual. My main fixed effect would be "density level".

The issue I am facing is that I have a factor variable, "pond" that I would like to include as a random effect - I do not care about differences between ponds, but would like to account for them statistically. One important point about the ponds is that I only have 3 of them, and I understand it is ideal to have more factor levels (5+) when dealing with random effects.

If it is possible to do, I would like some advice on how to implement a mixed model using betareg() or betamix() in R. I have read the R help files, but I usually find them difficult to understand (what each argument parameter really means in the context of my own data AND what the output values mean in ecological terms) and so I tend to work better via examples.

On a related note, I was wondering if I can instead use a glm() under a binomial family, and logit link, to accomplish accounting for random effects with this kind of data.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Kat Y
  • 355
  • 1
  • 3
  • 13
  • no you cannot inculde error terms in glm(). What about logit transform your response and considering a linear mixed model? – utobi Nov 26 '16 at 19:58
  • @utobi Thank you, I will try this. So, you do not have concerns having a random effect with only 3 levels? – Kat Y Nov 26 '16 at 21:00
  • I don't know the meaning of your variable "pond", but if you have repeated measures, random effects are almost a must. In case you don't have repeated measures, here random vs fixed is an open debate. Three levels of random effects may be ok, in principle their variance is estimable. I suggest you to check the literature in your field. A nice book which discusses about random vs fixed effects is http://www.stat.columbia.edu/~gelman/arm/. – utobi Nov 27 '16 at 08:04
  • 1
    @utobi thank you for your advice. It was helpful. I will look at that book! I ended up doing logit transformations and used lmer(). – Kat Y Dec 04 '16 at 03:52
  • Check out this answer https://stats.stackexchange.com/questions/167340/beta-regression-with-random-effect-of-source-plot-in-two-seasons – Diogo B Provete Apr 25 '17 at 18:56

4 Answers4

23

The package glmmTMB may be helpful for anyone with a similar question. For example, if you wanted to include pond from the above question as a random effect, the following code would do the trick:

glmmTMB(y ~ 1 + (1|pond), df, family=list(family="beta",link="logit"))
amoeba
  • 93,463
  • 28
  • 275
  • 317
Kori K
  • 356
  • 2
  • 3
  • Welcome to CV. Thank you for your contribution. This is rather a comment than an answer. Can you extend your answer please? – Ferdi Oct 02 '17 at 18:17
  • Sorry for the delay, I didn't see the comment immediately. Hope that helps. – Kori K Nov 01 '17 at 00:29
12

The current capabilities of betareg do not include random/mixed effects. In betareg() you can only include fixed effect, e.g., for your three-level pond variable. The betamix() function implements a finite mixture beta regression, not a mixed effects beta regression.

In your case, I would first try to see what effect a fixed pond factor effect has. This "costs" you two degrees of freedom while a random effect would be slightly cheaper with only one additional degree of freedom. But I would be surprised if the two approaches lead to very different qualitative insights.

Finally, while glm() does not support beta regression, but in the mgcv package there is the betar() family that can be used with the gam() function.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • Thank you for your input. You clarified some aspects of the betareg functions. At this point I have taken @utobi 's advice and did logit transformations so I can use lmer(). I will look into gam() since my next dataset is also bound between 0 and 1, and I cannot normalize the distributions via transformations :) – Kat Y Dec 04 '16 at 03:50
  • 1
    I would expect that the approaches return similar results but also some differences from which you could learn something. So I would recommend to try all three, i.e., `betareg` with fixed effects, logit-transformed `lmer` with random effects, and `gam` with `betar`. (And also: If the answer was useful, consider upvoting or accepting it.) – Achim Zeileis Dec 04 '16 at 08:01
4

This started as a comment, but went long. I don't think a random effects model is appropriate here. There are only 3 ponds -- do you want to estimate a variance from 3 numbers? That's kinda what's going with a random effects model. I'm guessing the ponds were chosen by reason of their convenience to the researcher, and not as a random sample of "Ponds of the Americas".

The advantage of a random effects model is that it allows you to construct a confidence interval on the response (activity level) that takes pond to pond variation into account. A fixed effects model -- in other words, treating pond like a block -- adjusts the response for the pond effect. If there were some addidtional treatment effect -- say two species of frog in each pond -- blocking reduces the mean square error (denominator of the F test) and allows the effect of the treatment to shine forth.

In this example, there is no treatment effect and the number of ponds is too small for a random effects model (and probably too "non-random"), so I'm not sure what conclusions can be drawn from this study. One could get a nice estimate of the difference between the ponds, but that's about it. I don't see inferences being drawn to the wider population of frogs in other pond settings. One could frame it as a pilot study, I suppose.

Bear in mind that any use of a random effects model here is going to give a very unreliable estimate for the pond variance and must be Used With Caution.

But as to your original question -- isn't this more of a rate problem? The go-to distribution for events-per-unit-time is the Poisson. So you could do Poisson regression using the counts with the time interval as an offset.

Placidia
  • 13,501
  • 6
  • 33
  • 62
0

I think you were right in guessing that you could use a glm binomial model.

No movement = failure (0), movement = 1 (success).

I second @Placidia that you do not have enough ponds (3) to justify making a random effect out of it.

The advantage of using a binomial model comes from using more of your original data to answer your hypothesis (more statistical power). The possibilities are:

  1. using a mixed-effects extension (binomial glmer) for tadpole identity as an individual random intercept (or slope)
  2. looking into the effect of different time intervals (categorically).

It all depends on your hypothesis. Maybe you do not need something more complex than a simple beta regression. It is not clear from your original question what your hypothesis is (but it is certainly not about comparing activities between ponds, as you said that you are not interested about that).

kdarras
  • 99
  • 8