3

I'm getting a result I cannot explain when using beta distribution.

I've got a result which came from a binomial distribution: 2 successes in 6 trials. I would think the maximum likelihood estimator for p would be 2/6 = 0.33?

dbinom(0:6, 6, 0.33)
[1] 0.090458382 0.267324771 0.329168562 0.216170399 0.079853991 0.015732428 0.001291468

But, when I use the beta distribution, the highest point I get is at 0.25:

beta_df <- data.frame(PROB = seq(0, 1, 0.01), HEIGHT = dbeta(seq(0, 1, 0.01), 2, 4))
beta_df[which.max(beta_df$HEIGHT),]
beta_df[which.max(beta_df$HEIGHT),]
   PROB   HEIGHT
26 0.25 2.109375

I cannot get my head around this... am I misinterpreting the results, or calling either of these functions incorrectly? Thanks :)

orrymr
  • 153
  • 6

2 Answers2

4

Why would you expect to see similar results? Those are different distributions, used for modelling completely different things. First is a discrete distribution, second is a continuous distribution. Answering your question, you are seeking for mode of the distributions. Mode of beta distribution is $\frac{\alpha - 1}{\alpha + \beta - 2}$, so exactly $0.25$ in case of the values you provided.

Regarding the comment, in binomial case you were maximizing the likelihood alone. When using Bayesian beta-binomial model, when maximizing it, you are considering also the prior

$$ \hat p = \operatorname{arg\,max} \; \underbrace{p(X|\theta)}_\text{likelihood}\,\underbrace{p(\theta)}_\text{prior} $$

so the choice of the prior would affect the result. When using $\alpha=\beta=0$ in the prior, this is an improper Haldane's prior that has all the probability mass over values $0$ and $1$ (see the picture below borrowed from this site).

enter image description here

Especially when the sample size is small, the prior would impact the result. In this case, it will drag the probability mass towards the extremes. To get result that is comparable to MLE, you could choose uniform prior with $\alpha=\beta=1$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • I understood the beta distribution to be a distribution over possible values of p, where p is the probability of success in a binomial distribution. Since I am getting 2 successes out of 6 in the binomial, I'd think MLE of p would be 2/6 = 0.33? – orrymr Sep 14 '20 at 08:35
  • 1
    @orrymr yes, but you are not talking about MLE, but Bayesian maximum a posteriori (MAP) estimate, where you assume improper Haldane's prior Beta(0, 0), so the result will depend not only on the data, but also on the prior choice. Under uniform Beta(1, 1) prior, they would be the same. – Tim Sep 14 '20 at 08:43
  • 1
    @orrymr The error is in using the $beta(n,m)$ distribution instead of the $beta(n+1,m+1)$ distribution. That is, it depends on the prior you are using (see Tim's response on Haldane's vs. uniform prior). You probably, wanted uniform instead of Haldane's. – LiKao Sep 14 '20 at 08:52
2

You are using the wrong parameter for the beta distribution. If you have a binomial experiment with $n$ successes and $m$ failures, you must use the $beta(n+1,m+1)$ distribution. The reason is, that you are basically using a $beta(1,1)$ (uniform) prior, that you have to add to the distribution (if you use a $beta(a,b)$ prior instead you get $beta(n+a,m+b)$).

So if you try that

beta_df <- data.frame(PROB = seq(0, 1, 0.01), HEIGHT = dbeta(seq(0, 1, 0.01), 3, 5))
beta_df[which.max(beta_df$HEIGHT),]

you get the correct result, i.e. $0.33$.

EDIT (more math):

So the likelihood for $n$ successes and $m$ failures is (up to a multiplicative constant):

$L(\theta|\,m,n) \sim \theta^n (1-\theta)^m$

But the beta density is given as

$f_{a,b}(\theta) \sim \theta^{(a-1)}(1-\theta)^{(b-1)}$.

So if you match $a-1=n$ and $b-1=m$ you get $a=n+1$ and $b=m+1$, as you should.

LiKao
  • 2,329
  • 1
  • 17
  • 25
  • I wouldn't say that the prior is "wrong", rather that the assumption that it'll give the same result was incorrect. Haldane's prior may be a sound choice in some cases. – Tim Sep 14 '20 at 09:00
  • @Tim I agree, as prior can't really be "wrong" or "correct" since they just represent prior assumptions. They can be justified or not, but not "wrong" in the actual sense. I just ment to convey with simple words that the prior just doesn't match the assumptions behind MLE. – LiKao Sep 14 '20 at 09:02