5

The formula (i.e. y~x) used for beta-regression is unclear to me. I would like to know what the resulting equation would be for a predicted beta-regression curve with the following summary output:

 Call:
 betareg(formula = DRatio ~ SVL1 + I(SVL1^2) | SVL1 + I(SVL1^2), data = sub.data, link = c("loglog"))

 Standardized weighted residuals 2:
     Min      1Q  Median      3Q     Max 
 -2.1507 -0.5259 -0.0668  0.8108  1.9400 

 Coefficients (mean model with loglog link):
               Estimate Std. Error z value Pr(>|z|)
 (Intercept) -0.9264491  0.9592569  -0.966    0.334
 SVL1         0.0464020  0.0768280   0.604    0.546
 I(SVL1^2)   -0.0004087  0.0014260  -0.287    0.774

 Phi coefficients (precision model with log link):
              Estimate Std. Error z value Pr(>|z|)   
 (Intercept)  6.122960   2.179827   2.809  0.00497 **
 SVL1        -0.400180   0.159235  -2.513  0.01197 * 
 I(SVL1^2)    0.006920   0.002816   2.458  0.01398 * 
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 Type of estimator: ML (maximum likelihood)
 Log-likelihood: 7.927 on 6 Df
 Pseudo R-squared: 0.02014
 Number of iterations: 24 (BFGS) + 2 (Fisher scoring)

I assume I must use the coefficients of the model. However, is the mean model supposed to be used? or can the precision model be used instead? I understand beta-regressions are complicated because the y-values are bound between 0 and 1, however if I want to plot this predictive curve in R I need to know what equation and what coefficients to use in a y~x format. Please advise.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Kat Y
  • 355
  • 1
  • 3
  • 13

1 Answers1

5

The answer depends on what exactly you want to predict and whether you want to do it by hand or using betareg. Generally, I would recommend to read vignette("betareg", package = "betareg") (also published in the Journal of Statistical Software at http://dx.doi.org/10.18637/jss.v034.i02) which provides more details on all aspects.

As for the theory: Beta regression is a probabilistic (or distributional) regression model where it is assumed that the response $y_i$ is distributed according to a beta distribution $\mathcal{B}(\mu_i, \phi_i)$ with observation-specific mean $\mu_i$ and precision parameter $\phi_i$. The parameters then depend on the regressors (aka covariates aka independent variables) through link functions and linear predictors. For the general notation see Section 2 of the vignette above. In your case, the equations are: $$ \begin{eqnarray*} \text{loglog}(\mu_i) & = & -0.926 + 0.0464 \cdot \text{SVL1}_i -0.00041 \cdot \text{SVL1}_i^2 \\ \log(\phi_i) & = & \phantom{--}6.123 - 0.4001 \cdot \text{SVL1}_i + 0.00692 \cdot \text{SVL1}_i^2 \end{eqnarray*} $$ where the log-log link is defined as $\text{loglog}(\mu) = -\log(-\log(\mu))$. Thus, in order to obtain the prediction for a given parameter you need to fix the value of $\text{SVL1}$ and evaluate the corresponding linear predictor on the right-hand side, and then use the corrsponding inverse link function. If you just want to predict the expected value, then just computing $\hat \mu_i$ is, of course, sufficient.

However, other quantities may be of interest as well, e.g., the variance (which depends on both $\mu$ and $\phi$) or certain quantiles of the distribution etc.

To compute these quantities in R using the betareg package you can simply use the predict() method. For example, you can easily obtain $\mu_i$ using predict(object, type = "mean"). But alternatively it's also straightforward to compute the median along with a 95% interval of the predicted distribution using predict(object, type = "quantile", at = c(0.05, 0.5, 0.95)).

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • Thanks @Achim Zeileis. I am now wondering if you can tell me what it means exactly when the precision model is significant, and the mean model is not? I am under the impression the mean model is what we want to turn out significant in order to say anything about the relationship between x and y. Comments? – Kat Y Mar 14 '17 at 02:44
  • In the mean model the squared term is not significant and hence the linear term is not straightforward to judge. Using orthogonal polynomials via `poly()` would make this easier, see http://stackoverflow.com/questions/29999900/poly-in-lm-difference-between-raw-vs-orthogonal/30000214#30000214 My suspicion is that you get a significant linear term, though. The precision model means that with increasing SVL1 you get decreasing precision (and larger variance). For visualization I recommend using quantiles. If you can share your data (or a perturbed version of it) I can provide some example code. – Achim Zeileis Mar 14 '17 at 08:33