3

I have a dependent variable, y, which appears to show a U-Shaped pattern. I want to prove that a U-shape is the most likely model than any other model type.

Since my y variable is percentages, it seems appropriate to use a Beta Regression to model my data. However I am now unsure how to model the U-Shape.

Reading this post that talks about Gaussian mixture models and U-shaped data, I was wondering if it is possible to implement a similar method with beta models?

I have been using betareg in R, to implement my models.

Some simulated data would be something like:

set.seed(0)
y <- rbeta(1000, shape1 = 0.5, shape2 = 0.5)
SamPassmore
  • 597
  • 3
  • 18
  • 3
    You must be careful; in a beta regression, the model is for the *conditional* distribution of the response. If you just look at say a histogram of the dependent variable, you're looking at the marginal (/unconditional) distribution of the response, which will be a mixture over the conditionals. The apparent shape may tell you almost nothing about the conditional distribution you're actually modelling. It's quite possible, for example, to have a J shaped distribution at some values, and a reverse-J at other values, and nowhere a U-shape, yet have a U-shaped mixture. – Glen_b Oct 18 '15 at 05:52
  • 1
    Regression is about modeling a DV in terms of one or more IVs. It's not about showing the shape of the DV itself (that is, the unconditional shape). You could look into curve fitting. – Peter Flom Oct 18 '15 at 11:32
  • Glen_b & Peter Flom Thank you for your advice. Perhaps I need to consider this further before progressing. If you could recommend any resources that might aid me further they would be greatly appreciated. – SamPassmore Oct 19 '15 at 06:01

1 Answers1

2

If you simulate from a bimodal beta distribution - as you do in your example - then you can (not surprisingly) recover the estimated parameters using betareg:

betareg(y ~ 1)
## Call:
## betareg(formula = y ~ 1)
## 
## Coefficients (mean model with logit link):
## (Intercept)  
##     0.06873  
## 
## Phi coefficients (precision model with identity link):
##  (phi)  
## 0.9983  

This estimates transformations of the original shape parameters $p = 0.5$ and $q = 0.5$. Specifically, the mean $\mu = p / (p + q) = 0.5$ is estimated on the logit scale, by default, i.e., plogis(0.06873) = 0.5171757. And the precision parameter is $\phi = p + q = 1$ which is also recovered almost exactly.

Whether or not you need a model that is able to encompass a bimodal distribution is another question, though. As pointed out by @Glen_b and @PeterFlom above, it is possible that the marginal distribution is bimodal while the conditional distribution is not.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53