Why use mean of posterior distribution instead of probability?

Question

I'm reading the Think Bayes (pdf link) by Allen B. Downey, and on this example I don't understand well the purpose of Mean in the chapter 3.2 The locomotive problem.

On page 24 the author proposed an alternative variant of computing posterior probability: computing the mean of the posterior distribution, that's not quite clear for me. Why is it better? What can it tell us that posterior probability doesn't say? In what case is it better to use Mean instead of posterior probability?

score 6 · Accepted Answer · edited Apr 13 '17 at 12:44

railroad numbers its locomotives in order 1..N. One day you see a locomotive with the number 60. Estimate how many locomotives the railroad has.

The example concerns with so called German tank problem. From what I see, Allen B. Downey does not suggest that taking mean of posterior distribution enables us to calculate posterior probability. The problem is about guessing the number of locomotives given only the information that there exists locomotive numbered 60. Bayesian analysis of this problem leads to using this data and uniform prior to obtain posterior distribution. The "best guess" about number of locomotives is the mean of this distribution. In this case we are not interested in probabilities, or distribution of the parameter of interest, but about point estimate for it. Mean of posterior distribution is one of such point estimates we can use.

As mentioned in the comments and in @peuhp's answer, in this case mean minimizes L2 norm (squared difference), but we could choose different estimators as well, e.g. median that minimizes L1 norm (absolute difference), mode that minimizes L0 norm etc. All this depends on the loss function that you want to minimize, i.e. the criteria that you use to choose when deciding on what is the "best guess".

You could be interested also in reading about maximum a posteriori estimation.

thanks @Tim ! in fact I wrong understood that guess (point estimate) should be some value, that has maximum probability. and so I haven't got why Mean could replace probability in this case. but now I see my misunderstanding — very_young, Nov 25 '15 at 09:44
Taking the mode (maximum) of the posterior is indeed another option and so is taking the median. — Björn, Nov 25 '15 at 10:19
_'The "best guess" about number of locomotives is the mean'_ naturally raises the question in what terms is it the best guess. For example, mean is the best when we want to minimize the expected value of $(\text{guess}-\text{actual value})^2$ based on our posterior distribution. — JiK, Nov 25 '15 at 11:35

peuhp · Answer 2 · 2015-11-25T10:06:24.060

6

Indeed, the mean of the posterior says nothing that the posterior density itself does not contain. However as it minmises the loss function $$ mean(p(\theta|x)) = argmin_{\theta^{*}} \int_{\theta} ||\theta^{*}-\theta||^2 \cdot p(\theta|x) \cdot d\theta $$ it provides a number (which can be interpreted more easily than a full distribution) that can be interpreted as a satisfying best guest estimate of the quantity of interest.

Moreover sometimes the density itself is hardly available in close form or estimatable using algorithm while the mean can be derivated/estimated more easily.

edited Nov 25 '15 at 10:06

answered Nov 25 '15 at 09:45

peuhp

4,622
20
38

thak you @peuhp ! so you suppose I understand it correctly that guess (point estimate) should be some value, that has maximum probability? so in this case mean just enable us interpret the point estimate easily? – very_young Nov 25 '15 at 10:10
2

yes this is the idea (but "maximum probability" in a certain sense). I think that you could focus on MAP (maximum a posteriori) which is more intuitive and corresponds to the value of the parameter for which the density is maximal. – peuhp Nov 25 '15 at 10:27

Why use mean of posterior distribution instead of probability?

2 Answers2