6

Suppose I have two points $(p_1,x_1)$ and $(p_2,x_2)$ where $p_i$ is a probability on the beta CDF and $x_i$ is a value on that same CDF. How would I go about determining the beta distribution shape parameters $\alpha$ and $\beta$ with only these two points?

To clarify, I'm assuming I already know the lower and upper bounds of the distribution L and U.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
gregor
  • 163
  • 1
  • 5
  • By definition a Beta distribution has bounds $0$ and $1$, so apparently you are referring to a scaled, recentered Beta. No matter: what is of import is how you obtained the numbers $p_1, p_2, x_1, x_2$. Are these given theoretically (that is, *exactly*), or have one or more of them been measured with some possibility of error? In all cases you will need to resort to numerical methods, but *which* methods are appropriate to use and how the output is interpreted depends on these distinctions. – whuber Aug 20 '14 at 13:57
  • 1
    @whuber Apologies, I am referring to a scaled, recentered beta. The points would specified exactly. I had thought this might require numerical methods, but I wasn't sure and wanted to check if there might be an analytic solution. – gregor Aug 20 '14 at 14:06

1 Answers1

6

Solution

The transformation

$$(p, x) \to (p, (x-L)/(U-L))$$

converts these points to ones on CDFs for a Beta$(\alpha,\beta)$ distribution. Assuming this has been done (so we don't need to change the notation), the problem is to find $\alpha$ and $\beta$ for which

$$F(\alpha, \beta; x_i) = p_i, i = 1, 2$$

where the Beta CDF equals

$$F(\alpha,\beta; x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\int_0^x t^{\alpha-1}(1-t)^{\beta-1} dt.$$

This is a nice function of $\alpha \gt 0$ and $\beta \gt 0$: it is differentiable in those arguments and not too expensive to evaluate. However, there exists no closed-form general solution, so numerical methods must be used.


Implementation Tips

Beware: when both $x_i$ are at the same tail of the distribution, or when either is close to $0$ or $1$, finding a reliable solution can be problematic. Techniques that are helpful in this circumstance are

  1. Reparameterize the distributions to avoid enforcing the "hard" constraints $\alpha\ge 0, \beta \ge 0$. One that works is to use the logarithms of these parameters.

  2. Optimize a measure of the relative error. A good way to do this for a CDF is to use the squared difference of the logits of the values: that is, replace the $p_i$ by $\text{logit}(p_i) = \log(p_i/(1-p_i))$ and fit $\text{logit}(F(\alpha,\beta;x_i))$ to these values by minimizing the sum of squared differences.

  3. Either begin with a very good estimate of the parameters or--as experimentation shows--overestimate them. (This may be less necessary when following guideline #2.)

Examples

Illustrating these methods is the following sample code using the nlm minimizer in R. It produces perfect fits even in the first case, which is numerically challenging because the $x_i$ are small and far out into the same tail. The output includes plots of the true CDFs (in black) on which are overlaid the fits (in red). The points on the plots show the two $(x_i, p_i)$ pairs.

This solution can fail in more extreme circumstances: obtaining a close estimate of the correct parameters can help assure success.

Figures

#
# Logistic transformation of the Beta CDF.
#
f.beta <- function(alpha, beta, x, lower=0, upper=1) {
  p <- pbeta((x-lower)/(upper-lower), alpha, beta)
  log(p/(1-p))
}
#
# Sums of squares.
#
delta <- function(fit, actual) sum((fit-actual)^2)
#
# The objective function handles the transformed parameters `theta` and
# uses `f.beta` and `delta` to fit the values and measure their discrepancies.
#
objective <- function(theta, x, prob, ...) {
  ab <- exp(theta) # Parameters are the *logs* of alpha and beta
  fit <- f.beta(ab[1], ab[2], x, ...)
  return (delta(fit, prob))
}
#
# Solve two problems.
#
par(mfrow=c(1,2))
alpha <- 15; beta <- 22 # The true parameters
for (x in list(c(1e-3, 2e-3), c(1/3, 2/3))) {
  x.p <- f.beta(alpha, beta, x)        # The correct values of the p_i
  start <- log(c(1e1, 1e1))            # A good guess is useful here
  sol <- nlm(objective, start, x=x, prob=x.p, lower=0, upper=1,
             typsize=c(1,1), fscale=1e-12, gradtol=1e-12)
  parms <- exp(sol$estimate)           # Estimates of alpha and beta
  #
  # Display the actual and estimated values.
  #
  print(rbind(Actual=c(alpha=alpha, beta=beta), Fit=parms))
  #
  # Plot the true and estimated CDFs.
  #      
  curve(pbeta(x, alpha, beta), 0, 1, n=1001, lwd=2)
  curve(pbeta(x, parms[1], parms[2]), n=1001, add=TRUE, col="Red")
  points(x, pbeta(x, alpha, beta))
}
whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • @parvin You shouldn't be making any changes at all. As explained at the outset of this answer, the first thing my solution does is transform a scaled, shifted Beta variable into a standard Beta variable. In the code, the default parameters for `l.beta` (namely, `lower=0` and `upper=1`) are for a standard Beta. – whuber Apr 10 '17 at 13:48
  • @parvin Do it in three steps. **1: Initialize** the quantiles and the *logits* of the probabilities (as described in the text of this answer): `x – whuber Apr 10 '17 at 14:18
  • @parvin I just gave you *complete, working* code. It will execute almost instantaneously if you have first run the example code in my answer. If you have further questions about how to use `R`, might I suggest you inquire on [SO]? – whuber Apr 10 '17 at 15:07
  • Dear whuber, Thank you very much, very thorough answer, as always. I was wondering (and I can post a question if my question would have a possible answer) though if given some empirical information about the (***quantiles, mean, and variance***), we can find the **$\alpha~ and ~\beta$** for a reasonably comparable beta-distribution that would have the *similar quantiles, mean, and variance* as provided by the empirical information? – rnorouzian Apr 16 '17 at 15:08
  • @parvin It depends on what precisely is meant by "empirical information." If you are referring to these properties of a *sample*, then in principle you could estimate $\alpha$ and $\beta$ using Maximum Likelihood. This requires being able to compute the joint probability density for the specific statistics you have. – whuber Apr 16 '17 at 19:42
  • So, suppose I collect a sizable amount of information about the variability in a naturally [0, 1] parameter from a large pool of previously published research. I simply make a histogram of these obtained parameter values, and obtain quantiles, mean, and variance for this histgram. My goal is to use this empirical information and find an equivalent Beta distribution whose quantiles, mean, and variance match (are relatively close to) my initial histogram ? So, it's a kind of like a prior-finding problem. – rnorouzian Apr 16 '17 at 19:57
  • @parvin You would be much better off estimating the parameters from all the data, rather than from a summary of them. – whuber Apr 16 '17 at 21:40
  • I see, Any specific method you might have in mind, are you referring to "empirical Bayes"? – rnorouzian Apr 16 '17 at 21:49
  • @whuber I'm a little confused here. It seems like the objective function relies on knowing the true parameter values in order to fit, which seems circular because if you knew the original parameter values then why fit at all? To me it seems the objective function requires prob which is provided as x.p which is itself a transform of pbeta with arguments that are the original parameter values? – epp Jan 17 '18 at 22:28
  • @Stats It's no different than solving the equation $f(x)=y$ for $x$ given $y$: the solver systematically (and efficiently) produces a sequence of *trial* values $x_1, x_2, \ldots,$ that are plugged into $f$ until one is found for which the output is sufficiently close to $y$. There's nothing circular about that. I'm doing exactly the same thing where $x$ is a two-component vector and $y$ is a two-component vector. – whuber Jan 17 '18 at 22:50
  • 1
    @whuber, why do you choose to use logits? I see why we would use the logarithms to remove the problem of the >0 constraint. I'm not quite certain what motivates the use of the logit. Could you please elaborate? – Jan May 28 '19 at 12:18
  • 1
    @Jan The log is useless when $p_i$ is close to $1.$ Moreover, the logit treats probabilities symmetrically in the sense that its derivative at $p$ is the same as its derivative at $1-p$ and the Beta family is similarly symmetric in the sense that whenever $F$ is a Beta distribution, so is the function $x\to 1-F(1-x).$ Although the logarithm alone might work, both these characteristics suggest the logit will be even better and it's scarcely any more expensive to compute. – whuber May 28 '19 at 12:30