Terminology for Bayesian Posterior Mean of Probability with Uniform Prior

Question

If $p \sim$ Uniform$(0,1)$, and $X \sim$ Bin$(n, p)$, then the posterior mean of $p$ is given by $\frac{X+1}{n+2}$.

Is there a common name for this estimator? I've found it solves lots of people's problems and I'd like to be able to point people to a reference, but haven't been able to find the right name for it.

I vaguely recall this being called something like the "+1/+2 estimator" in a stats 101 book but that's not a very searchable term.

When you say estimator you mean MAP estimator that minimizes some loss function? — Good Luck, Mar 22 '21 at 12:59

BruceET · Accepted Answer · 2019-02-09T17:23:12.940

With prior $\mathsf{Unif}(0,1) \equiv \mathsf{Beta}(\alpha_0=1,\beta_0 =1)$ and likelihood $\mathsf{Binom}(n, \theta)$ showing $x$ successes in $n$ trials, the posterior distribution is $\mathsf{Beta}(\alpha_n=1 + x,\; \beta_n = 1 + n - x).$ (This is easily seen by multiplying the kernels of the prior and likelihood to get the kernel of the posterior.)

Then the posterior mean is $$\mu_n = \frac{\alpha_n}{\alpha_n+\beta} = \frac{x+1}{n+2}.$$

In a Bayesian context, just using the terminology posterior mean may be best. (The median of the posterior distribution and the maximum of its PDF have also been used to summarize posterior information.)

Notes: (1) Here you are using $\mathsf{Beta}(1,1)$ as a noninformative prior distribution. On sound theoretical grounds, some Bayesian statisticians prefer to use the Jeffreys prior $\mathsf{Beta}(\frac 1 2, \frac 1 2)$ as a noninformative prior. Then the posterior mean is $\mu_n = \frac{x+.5}{n+1}.$

(2) In making frequentist confidence intervals Agresti and Coull have suggested "adding two successes and two failures" to the sample in order to get a confidence interval based on the estimator $\hat p = \frac{x+2}{n+4},$ which has more accurate coverage probabilities (than the traditional Wald interval using $\hat p = \frac x n).$ David Moore has dubbed this a plus-four estimator in some of his widely-used elementary statistics texts, and the terminology has been used by others. I would not be surprised to see your estimator called 'plus two' and Jeffries' called 'plus one'.

(3) All of these estimators have the effect of 'shrinking the estimator towards 1/2' and so they have been called 'shrinkage estimators,' (a term that is much more widely used, particularly in James-Stein inference). See Answer (+1) by @Taylor.

It helps with the derivation you wrote is easy. I guess some people might encounter this question by actually looking for the derivation itself. — Royi, Feb 08 '19 at 19:59
(2) is really what I was interested in. I didn't realize that estimator was presented for purely Frequentist justifications. In the cases I prescribe it as a solution, it's always something like how to compute a probability when a certain multinomial hasn't been seen before (i.e., clustering on letter counts and one cluster includes no "z"s), so nothing to do with coverage probabilities of CIs. Thank you! — Cliff AB, Feb 08 '19 at 20:22
In a practical application, you can ignore neither coverage probability nor avg length of the CI. Otherwise, you would be happy with an all-purpose 100% CI for binomial success probability is the completely uninformative interval $(0,1).$ // Upvote for clearly stating in this Comment your reason for asking the question. — BruceET, Feb 09 '19 at 20:58
What is the idea of "summarize posterior information" is this just to bring parameters together? — Good Luck, Mar 22 '21 at 12:32

Xi'an · Answer 2 · 2019-02-09T08:33:42.400

11

This is called Laplace's smoothing, or Laplace's rule of succession, as Pierre-Simon Laplace used it for estimating the probability the sun rises again tomorrow: "We thus find that an event having occurred a number of times, the probability that it will happen again the next time is equal to this number increased by the unit, divided by the same number increased by two units."

Essai philosophique sur les probabilités par le marquis de Laplace

edited Feb 09 '19 at 08:33

answered Feb 09 '19 at 07:50

Xi'an

90,397
9
157
575

1

(+1) for historical reference – BruceET Feb 09 '19 at 17:00
(+1) Both this and @BruceET's answers were different but correct answers to my question. – Cliff AB Feb 09 '19 at 17:19

score 5 · Answer 3 · answered Feb 08 '19 at 17:07

5

You could call it a shrinkage estimator. The estimator is closer to $.5$ than the more ubiquitous sample mean.

answered Feb 08 '19 at 17:07

Taylor

18,278
2
31
66

2

(+1) That is true, it is a shrinkage estimator. I wanted a specific name for the binomial/multinomial case so I can point other researchers to material on that exact estimator so that they don't think I'm just saying "add 1 to things until you get the answer you want" but also not have to start from the beginning of explaining what Bayesian statistics is. – Cliff AB Feb 08 '19 at 20:26

Terminology for Bayesian Posterior Mean of Probability with Uniform Prior

3 Answers3