3

I have in my study a variable that follows a beta distribution. In this case, it is: length of the thorax / wing length in a Drosophila species.

My model involves two fixed cross variables and one random variable:

model <- glmmTMB(RV ~ V1 * V2 + (1 | RANDOM), data, 
  family = list(family = "beta", link = "logit"))

As I do not know the assumptions that this distribution implies, I would like to know if my model is valid. How can I check diagnostics for this model, such as inspection of residuals? Is there some other procedure to check if the model is valid (that is, meeting assumptions?)

If it is necessary to evaluate some other assumption, I would like to know how I do it, what it is, and the script I would use to do so.

Mark White
  • 8,712
  • 4
  • 23
  • 61
  • 1
    Why would you use a model that you don't understand? – prince_of_pears Mar 09 '18 at 23:40
  • 3
    Because my variable answer follows that distribution, i can not model it as normal. i read that the distribution that would be adjusted to my variable is this, and i need to know how to validate my model. – Momo Afarensis Mar 09 '18 at 23:45
  • 1
    I edited your question to be more clear in what I think you're asking—did I get it right that you are looking to perform diagnostic checks on this model? – Mark White Mar 09 '18 at 23:58

1 Answers1

3

I am afraid I have a relatively unsatisfying answer, but I have included a number of references you may explore.

Beta regression models are relatively new, compared to the rest of the common generalized linear models. Ferrari & Cribari-Neto (2004) introduced the parameterization that is used by most statistical packages, so it's not even 15 years old yet. I say this because diagnostics for beta regression models are still an active area of research, with many different authors proposing different techniques for diagnosing issues with these models (Espinheira, Ferrari, & Cribari-Neto, 2008a, 2008b; Espinheira, Santos, & Cribari-Neto, 2017; Pereira, 2017). There does not seem yet to be an agreed-upon method for performing diagnostic checks on these models, but those papers should give you an idea.

Comparing it to ordinary least squares regression (OLS), residuals should not necessarily be normally distributed (see papers above). Moreover, because you are estimating the $\phi$ parameter and because the variance is a function of the mean in the beta distribution, the model is naturally heteroskedastic. So you aren't doing the same type of diagnostic checks as you would in an OLS regression.

You are also estimating random intercepts in your model. Using a Gaussian (e.g., identity) link function, you would expect these intercepts to be normally distributed, as well. I do not know what the expectation is in beta regression, and I would bet that the package might not give you satisfactory information on how they identify the models in the documentation (from my experience, beta regression models are included in packages that use a wide range of distributions, but far more rarely explained in much statistical detail). But hopefully they have a reference you can check somewhere; I am less familiar with mixed effects models in beta regression.

References

Espinheira, P. L., Ferrari, S. L., & Cribari-Neto, F. (2008a). Influence diagnostics in beta regression. Computational Statistics & Data Analysis, 52 (9), 4417–4431.

Espinheira, P. L., Ferrari, S. L., & Cribari-Neto, F. (2008b). On beta regression residuals. Journal of Applied Statistics, 35(4), 407–419.

Espinheira, P. L., Santos, E. G., & Cribari-Neto, F. (2017). On nonlinear beta regression residuals. Biometrical Journal, 59(3), 445–461.

Ferrari, S. L., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799–815.

Pereira, G. H. (2017). On quantile residuals in beta regression. Communications in Statistics - Simulation and Computation, 1–15.

Mark White
  • 8,712
  • 4
  • 23
  • 61
  • 3
    The residuals do not have a Beta distribution. Note, for example, that probably roughly half of the residuals will be negative, but Beta distributions do not contain negative values. The distribution of the errors is actually not important here; the model makes assumptions about the conditional distribution of Y given X, not about the distribution of the errors. These two things only coincide in special cases like a Normal response. – Jake Westfall Mar 10 '18 at 01:45
  • +1 You are correct—thanks, Jake. Dumb mistake on my part: I misspoke and meant to say that conditional Y|X will be beta, and got confused because they coincide with OLS. – Mark White Mar 10 '18 at 01:49
  • Thank you very much for answering all. Thank you Mark White for editing the question and making it clearer. I will review the suggested bibliography. I hope to find some way soon to validate the model. My interest is to know if due to the estimation of the parameter φ it is necessary to calculate an overdispersion factor or something to validate the model, and if so, know how to do it. – Momo Afarensis Mar 12 '18 at 17:29