I'm looking to run a linear mixed effect model using lme4, where my dependent variable one_syllable_words / total_words_generated
is a proportion and my random effect (1 | participant_ID)
reflects the longitudinal nature of the design. Independent, fixed effect variables of interest include age
, group
, timepoint
, and interactions between them.
I've come across two main ways to deal with the proportional nature of the DV:
Standard logistic regression / binomial GLM
In my scenario, I envision the lme4 equation looking like this:
glmer(one_syllable_words / total_words_generated ~ age + group + timepoint + age:timepoint + age:group + timepoint:group + (1 | participant_ID), family = "binomial", weights = total_words_generated, data = mydat)
Beta regression
I would apply a transformation to my DV
(DV * (n - 1) + .5)/ n)
so that it cannot equal 0 or 1. (There are a few instances where it equals zero, no instances where it equals one.)
I'm unclear whether logistic regression or beta regression is preferred in this example. My DV isn't a clear-cut case of successes and failures (unless we stretch the definition of "success"), so I'm worried logistic regression might not be appropriate. However, I'm having trouble getting a firm grasp on beta regression & all it entails. If beta regression is preferred:
- Why is it preferred?
- What is it doing "behind the scenes" to the data?
- How can it be applied in R?