Fit a Beta Regression model to a U-shaped Dependent Variable

Question

I have a dependent variable, y, which appears to show a U-Shaped pattern. I want to prove that a U-shape is the most likely model than any other model type.

Since my y variable is percentages, it seems appropriate to use a Beta Regression to model my data. However I am now unsure how to model the U-Shape.

Reading this post that talks about Gaussian mixture models and U-shaped data, I was wondering if it is possible to implement a similar method with beta models?

I have been using betareg in R, to implement my models.

Some simulated data would be something like:

set.seed(0)
y <- rbeta(1000, shape1 = 0.5, shape2 = 0.5)

You must be careful; in a beta regression, the model is for the *conditional* distribution of the response. If you just look at say a histogram of the dependent variable, you're looking at the marginal (/unconditional) distribution of the response, which will be a mixture over the conditionals. The apparent shape may tell you almost nothing about the conditional distribution you're actually modelling. It's quite possible, for example, to have a J shaped distribution at some values, and a reverse-J at other values, and nowhere a U-shape, yet have a U-shaped mixture. — Glen_b, Oct 18 '15 at 05:52
Regression is about modeling a DV in terms of one or more IVs. It's not about showing the shape of the DV itself (that is, the unconditional shape). You could look into curve fitting. — Peter Flom, Oct 18 '15 at 11:32
Glen_b & Peter Flom Thank you for your advice. Perhaps I need to consider this further before progressing. If you could recommend any resources that might aid me further they would be greatly appreciated. — SamPassmore, Oct 19 '15 at 06:01

score 2 · Accepted Answer · answered Oct 18 '15 at 16:01

If you simulate from a bimodal beta distribution - as you do in your example - then you can (not surprisingly) recover the estimated parameters using betareg:

betareg(y ~ 1)
## Call:
## betareg(formula = y ~ 1)
## 
## Coefficients (mean model with logit link):
## (Intercept)  
##     0.06873  
## 
## Phi coefficients (precision model with identity link):
##  (phi)  
## 0.9983

This estimates transformations of the original shape parameters $p = 0.5$ and $q = 0.5$. Specifically, the mean $\mu = p / (p + q) = 0.5$ is estimated on the logit scale, by default, i.e., plogis(0.06873) = 0.5171757. And the precision parameter is $\phi = p + q = 1$ which is also recovered almost exactly.

Whether or not you need a model that is able to encompass a bimodal distribution is another question, though. As pointed out by @Glen_b and @PeterFlom above, it is possible that the marginal distribution is bimodal while the conditional distribution is not.

Fit a Beta Regression model to a U-shaped Dependent Variable

1 Answers1

Linked