2

In various places it says that the likelihood (e.g. in the Bayes formula) is "proportional to a probablility".

For example https://alexanderetz.com/2015/04/15/understanding-bayes-a-look-at-the-likelihood/ "likelihood is proportional to a probability"

A similar statement is in the book Think Bayes, "likelihood doesn't need to compute a probability, it only has to compute something proportional to probability"

Lastly, in a crossvalidated discussion https://stats.stackexchange.com/questions/2641/what­is­the­difference­between­likelihood­and­%20probability there is a more specific statement "the likelihood function is proportional to the probability of the observed data." I am not sure if that is correct or not.

Notation, $P(A|B) = P(B|A) P(A) / P(B)$. The likelihood is $P(B|A)$.

The question: If I am understanding however, the likelihood is proportional to a different probability for each value of $A$. That is, there is no single constant $c$ such that the likelihood is equal to some version of the probability $P(B)$, i.e. this statement is false $$ P(B) = c \cdot P(B|A) $$ Rather, it would have to be $$ P(B) = c(A) \cdot P(B|A) $$ meaning that the constant of proportionality $c$ varies with each choice of $A$.

Is this true? If so, it seems to me that saying likelihood is proportional to a probability is vacuous. You can always make something "proportional to" something else if the proportionality is a function rather than a constant.

This question is somewhat related to these previous questions about likelihood, which I have read, however I believe this is a different question (albeit maybe one that could be "derived from" the answers to other questions, buy someone smarter than me!)

What does "likelihood is only defined up to a multiplicative constant of proportionality" mean in practice?

When is likelihood also a probability distribution?

EDIT: Maybe another way of asking the question is this: if the likelihood is proportional to a probability, which probability is it proportional to? Is it the probability $P(B|A)$ regarded as a function of B with A held fixed, or is it the marginal probability P(B), or is it some other (generic) probability?

EDIT: here is an attempt at an example:

A Normal distribution is parameterized by mean and variance. Let's pick something simpler,a PMF that is parameterized only by a single parameter $M$. It assigns probabilities to integer values, and the parameter translates the PMF on the integer axis.
Here is the PMF for the parameter value M=5: $$ \begin{align*} & P(X=5|M=5) = .5 \\ & P(X=6|M=5) = .3 \\ & P(X=7|M=5) = .2 \\ \end{align*} $$ And the PMF for parameter value M=6: $$ \begin{align*} & P(X=5|M=6) = 0 \\ & P(X=6|M=6) = .5 \\ & P(X=7|M=6) = .3 \\ & P(X=8|M=6) = .2 \\ \end{align*} $$ More generically, the PMF is like P(X=M)=0.5, P(X=M+1)=.3, P(X=M+2)=.2, zero otherwise.

Now consider the likelihood form of this, where the data is given as the fixed value X=5 and the parameter $M$ is what varies: $$ \begin{align*} & P(X=5|M=3) = .2 \\ & P(X=5|M=4) = .3 \\ & P(X=5|M=5) = .5 \\ \end{align*} $$

So in the PMF case the shape is (.5,.3,.2), whereas in the likelihood case the shape is (.2,.3,.5). There is no single constant that can make these equal.

What is my mistake?

basicidea
  • 127
  • 4
  • 3
    I don't follow your argument. Perhaps one might be justified in asserting that the phrase "proportional to a probability" may be vague (for several reasons, of which perhaps the most salient is that many likelihoods aren't formed from probabilities at all, but rather from probability *densities*), but what it means is clear: two likelihoods (*qua* functions of their parameters) are considered equivalent when a nonzero multiple of one equals the other. – whuber Jan 20 '19 at 20:50
  • 3
    In addition to @whuber point, the use of Bayes' formula in this context is making things more confused. Given a family of pmf's $p_\theta(x)$ and an observation $x^o$ supposedly generated from one of these, $p_{\theta^o}(x)$, the likelihood function $\ell(\theta)$ is a function of $\theta$ proportional to $p_\theta(x^o)$, proportional in the sense that there exists a constant $\kappa$ such that $\ell(\theta)=\kappa p_\theta(x^o)$ , $\kappa$ being _constant_ in $\theta$. – Xi'an Jan 20 '19 at 21:20
  • @whuber , xi'an I added a specific example of my question. I do not see how the constant $\kappa$ can really be constant in $\theta$. – basicidea Jan 21 '19 at 00:20
  • @whuber as an aside, must the likelihood always be formed from a density - it cannot be formed from a probability mass function, in the case where the data and parameter are discrete e.g. integer values? – basicidea Jan 21 '19 at 00:21
  • 1
    The likelihood is not $\Pr(B\mid A).$ The likelihood is $A\mapsto\Pr(B\mid A).$ It is that expression as a function of $A,$ not of $B. \qquad$ – Michael Hardy Jan 21 '19 at 01:24
  • @MichaelHardy yes, and that underlies my question. Pr(B|A) as a function of B is different from Pr(B|A) as a function of A. Considered as functions, they can have a different "shape"l I do not see how can they be related by a single constant of proportionality (see example in the end of the question) .... what am I missing? – basicidea Jan 21 '19 at 01:36
  • 1
    The likelihood depends on the model: it can be formed from probabilities, densities, *or both.* As an example of the latter see https://stats.stackexchange.com/questions/49443/how-to-model-this-odd-shaped-distribution-almost-a-reverse-j/49456#49456. For the standard example of probabilities, see [any elementary question about binomial likelihood](https://stats.stackexchange.com/search?q=binomial+likelihood+%5Bself-study%5D+-bayesian+-posterior), for instance. BTW, the comments suggest you ought to study the thread at https://stats.stackexchange.com/questions/2641. – whuber Jan 21 '19 at 16:48

2 Answers2

3

The likelihood function is defined in the context of parametric distribution. Given a family of pmf's $p_θ(x)$, $x\in\mathfrak{X}$ meaning that, for every and all values of $θ$, $p_θ(\cdot)$ is a probability density over $\mathfrak{X}$ [wrt to a well-defined dominating measure] and an observation $x^o$, supposedly generated from one of these pmf's, ${p_{θ^o}}(\cdot)$ say, the likelihood function$$ℓ:\Theta \longrightarrow \mathbb{R}^+$$is a function (of $θ$) proportional to $p_θ(x^o)$, $$\ell(θ) \propto p_θ(x^o)$$ proportional in the sense that there exists a constant $κ$ such that $$ℓ(θ)=κp_θ(x^o)$$ $κ$ being constant in $θ$ (but possibly depending on $x^o$, dependence that does not matter since $x^o$ is observed, hence fixed).

If one does not adopt a Bayesian or a fiducial approach, the likelihood cannot be defined more precisely, that is, there is no general principle towards selecting a value of $κ$. Hence the statement that it is proportional to the pmf, taken at $x^o$, a probability density in $x$ and not in $\theta$. The likelihood is not a probability distribution in $\theta$ as the "likely" in "likelihood" refers to the probability of observing the observed $x^o$ given the inputed value of the parameter $\theta$.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • For me the key in your answer are the words "since $x^o$ is observed, hence fixed." To me this suggests that likelihood is not proportional to the probability (density) $p(x|\theta)$with $x$ varying. So then it seems that the "proportional to a probability" is a very simple statement, that for a fixed $x$ and $\theta$, "the likelihood is $c \cdot p(x|\theta)$, where the constant $c$ is non necessarily one. Nothing more complicated than that. – basicidea Jan 23 '19 at 07:12
  • The one concern I have remaining: I feel that when $\theta$ changes, certainly the probability $p(x|\theta)$ changes, but I feel that it is a probability from a _different_ probabilty distribution, since $p(x|\theta)$ viewed as a function of $\theta$ is not itself a valid PMF or density. Said differently, each $\theta$ produces a different density $p_{\theta}(x)$ (using a different notation), so the as $\theta$ is varied, the likelihood is proportional to a range of probabilities selected from different densities. However, this is just my thinking, and I would be glad if someone corrects it. – basicidea Jan 23 '19 at 07:15
  • 1
    Both points are correct. This is the inversion inherent to the very notion of likelihood: what was fixed ($\theta$) is now variable while what was (random) variable ($x$) is now fixed. The proportionality is thus not conveying much besides the notion that likelihood values are relative, hence the arbitrary constant $c$. – Xi'an Jan 23 '19 at 07:20
1

Suppose three biased coins have probabilities $0.6,\,\,0.7,\,\, 0.8$ of "heads".

Suppose you are equally likely to have any of these three coins in your hand. You toss it and it turns up "heads". Then you have $$ \begin{array}{cccc} \text{prior} & \text{likelihood} & \text{product} & \text{posterior} \\ \downarrow & \downarrow & \downarrow & \downarrow \\ 1/3 & 0.6 & 0.6/3 & 6/(6+7+8) = 6/21 \\ 1/3 & 0.7 & 0.7/3 & 7/(6+7+8) = 7/21 \\ 1/3 & 0.8 & 0.8/3 & 8/(6+7+8) = 8/21 \end{array} $$ Multiplication of the "product" column by the normalizing constant $3/(6+7+8)$ yields the "posterior" column.

The "likelihood" column is not proportional to the "posterior" column in cases where the prior probabilities are not all equal to each other.

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
  • Thank you. Can you relate this to my question about how the likelihood is "proportional to a probability" (see quotes at the top of the question)? As can be seen in your answer, the likelihood*prior is proportional to the posterior probability, but likelihood*prior is a different thing than likelihood, and the quotes above say "likelihood" (without mentioning prior). – basicidea Jan 21 '19 at 03:46
  • Apology, in my comment I had written " likelihood * prior ", however the * was used as formatting. Where it says "likelihoodprior", I mean "likelihood x prior". – basicidea Jan 21 '19 at 04:41
  • @basicidea : Look at the very last sentence in my answer above. – Michael Hardy Jan 22 '19 at 03:06