How is the likelihood different from the posterior?

Question

I come from an applied mathematics background and have never looked at statistics, but I started studying Machine Learning recently. One thing that I am struggling to understand is: what is the difference between the Likelihood and the Posterior?

Bayes' Theorem is given by

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

where $P(A|B)$ is the Posterior and $P(B|A)$ is the Likelihood. If you multiply both sides by $P(B)$ and divide by $P(A)$ you end up with

$$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$$

and this looks like the same equation. What is the Posterior/Likelihood now?

I want to have a good understanding of the Posterior/Likelihood so if there are any recommended texts, please feel free to share them.

In this discrete version of Bayes' Thm, Sets $A_1, A_2, \dots, A_k$ form a _partition_ of the sample space $S$ such that $A_i\cap A_j = \emptyset,$ for $i\ne j.$ and $\cup_{i=1}^k A_i = S.$ The prior distribution is given by $P(A_i), i=1,2,\dots, k.$ Then $B$ is data, and the posterior is given by $P(A_i\,|\,B), i = 1, 2, \dots, k.$ [In simple examples, the partition may consist of two events $\{A, A^c\}.]$ — BruceET, Mar 14 '21 at 22:48
Any textbook on Bayesian statistics will explain the difference between the likelihood (not a density on the parameter) and the posterior (a density on the parameter). — Xi'an, Mar 15 '21 at 07:42
I don't like that my question was closed because a similar question exists. That question does not answer my question. @Good_Luck answered my question. — user572780, Mar 15 '21 at 17:06

score 0 · Accepted Answer · answered Mar 15 '21 at 12:51

You need to consider what is actual data, and what are the parameters: not just some $A$ and $B$.

The process is connected with the sampling procedure and because of that we already have some data likelihood.

If you ever wonder what is the probability-likelyhood difference you can think of it like likelihood is something that we already evidenced. Data happened and we know what is the likelihood of the data.

We always speak of the likelihood of the data, I am not aware we speak of likelyhood of the parameters, since we usually need to find the parameters in some way, say expectation maximization algorithm.

So:

$\mathbb P(\theta \mid \mathsf {data})$ is the posterior
$\mathbb P(\mathsf {data} \mid \theta)$ is the likelihood

You have chicken and the egg problem and if you ever thought about that you already know Bayesian statistics. Prior is the regulizer $\mathbb P(\theta)$ present to create the new posterior based on the likelihood, but then the posterior can become the new prior, if you iterate the procedure.

You have the normalization constant to make the posterior the true probability that adds to 1 and both the prior and posterior should be PDF or PMF, but if we already used the normalization constant to make the posterior the PDF or PMF then likelihood may not be the probability distribution.

You can always make the constant $C$ and make it add to one, but out of the Bayes formula.

$$\mathbb P(\theta \mid X) =\frac{\mathbb P(X \mid \theta) \mathbb P(\theta)}{\mathbb P(X)}$$

_{* in here $X$ is the data}

Fisher invented the likelihood as conditional probability data conditioned with some parameters theta that may not add to one, unless by accident.

On the other hand posterior is also conditional but it adds to one always.

How is the likelihood different from the posterior?

1 Answers1