5

Example 1 of the wikipedia article on the Likelihood Function suggests that given the likelihood function $\mathcal{L}(\theta | X)$, we can find the probability of $\theta$ given $X$, $p(\theta|X)$.

Is this just a confusion of notation? That is, some of the literature likes to write $\mathcal{L}(\theta|X) = p(X|\theta)$ even though the likelihood is $\textit{not}$ a conditional probability distribution, and is perhaps more aptly notated as $p(X;\theta)$.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
user393454
  • 53
  • 2
  • 1
    Example 1 correctly shows that likelihood is a probability of $X$ given a fixed value of parameter $\theta$. For it to be a conditional probability, you'd need to assume $\theta$ to be a random variable and have a distribution, what inevitably leads you to making Bayesian assumptions rather then frequentist. – Tim Oct 26 '17 at 14:00
  • 1
    Ah, so the last paragraph makes bayesian assumptions? "This is not the same as saying that the probability that ${\displaystyle p_{\text{H}}=0.5} p_\text{H} = 0.5$, given the observation $HH$, is $0.25$. For that, we could apply Bayes' theorem, which implies that the posterior probability (density) is proportional to the likelihood times the prior probability." – user393454 Oct 26 '17 at 14:04
  • 1
    Likelihood is not a density on $\Theta$, i.e. $\int_{\Theta} L(\theta | X) d \theta \neq 1$. But it is a density on $X$. – Łukasz Grad Oct 26 '17 at 14:04
  • See https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability and https://stats.stackexchange.com/questions/224037/wikipedia-entry-on-likelihood-seems-ambiguous – Tim Oct 26 '17 at 14:08

3 Answers3

4

Let ${\displaystyle p_{\text{H}}}$ be the probability that a certain coin lands heads up (H) when tossed. So, the probability of getting two heads in two tosses (HH) is ${\displaystyle p_{\text{H}}^{2}}$. If ${\displaystyle p_{\text{H}}=0.5}$, then the probability of seeing two heads is 0.25:

$${\displaystyle P({\text{HH}}\mid p_{\text{H}}=0.5)=0.25.}$$

With this, we can say that the likelihood of ${\displaystyle p_{\text{H}}=0.5}$, given the observation HH, is 0.25, that is

$${\displaystyle {\mathcal {L}}(p_{\text{H}}=0.5\mid {\text{HH}})=P({\text{HH}}\mid p_{\text{H}}=0.5)=0.25.}$$

This is not the same as saying that the probability that ${\displaystyle p_{\text{H}}=0.5}$, given the observation HH, is $0.25$. For that, we could apply Bayes' theorem, which implies that the posterior probability (density) is proportional to the likelihood times the prior probability. [Wikipedia Example 1 on Likelihood function]

This Wikipedia Example states exactly what it should:

the likelihood (function) $$\mathcal{L}(\theta|x)$$ as a function of $\theta$ indexed by the realised observation $x$, takes an image value at a particular value of the parameter (like $\theta={\displaystyle p_{\text{H}}=0.5}$) that is the value of the sampling distribution (pmf or pdf) at the observed sample for that value of the parameter $$p(x|\theta).$$ The final paragraph is a proper warning that a likelihood value or function is in general not a probability value or density/mass function on the parameter. To turn the likelihood function into a density function, the parameter space needs to be endowed with a probability structure, including a prior distribution/measure, which turns the sampling probability density into a conditional probability density.

The last sentence could always be turned into something clearer, like

For producing a probability statement on a value of the parameter, one needs to consider this parameter as a random variable, which requires a probability measure on the parameter space, called a prior distribution. With this preliminary, one applies Bayes' theorem, defining the posterior probability (density) on the parameter as proportional to the likelihood times the prior probability.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • I believe your last paragraph addresses my concern: the wikipedia article does not mention that the likelihood function would first need to be "endowed with a probability structure", as you put it, because $p(x|\theta)$ is not necessarily a probability distribution. Correct? – user393454 Oct 26 '17 at 14:29
  • Is there any reason the article should be left as is, not addressing the fact that one cannot simply apply Bayes' theorem? (Since $p(x|\theta)$ may not be a probability distribution.) – user393454 Oct 26 '17 at 14:37
  • Yes, I see what you're saying, but the sentence is worded such that the existence of a prior probability is implicitly assumed. I think the paragraph would be clearer with an addendum like: "Note that the application of Bayes' theorem assumes that there exists a prior probability over $\theta$, and thus that $\theta$ is a random variable." Please correct me if you find anything wrong with this sentence. – user393454 Oct 26 '17 at 14:49
  • Please see my suggestion for a rewording, as I find the "thus that θ is a random variable" open to confusion (and the customary objection that $\theta$ is a fixed number and hence cannot be a random variable, which is missing the point of the Bayesian resolution). – Xi'an Oct 26 '17 at 14:54
0

Yes you can. The distribution function you get when you do that is called the posterior distribution function. You need to specify a marginal distribution for the parameter though, which is called the prior distribution.

-1

I think wiki uses a confusing notation and term. It uses likelihood to distinguish it from posterior probability.

If we write down the full Bayes'r rule, it should be as: $P(p_H=0.5|HH) = \frac{P(HH|p_H=0.5)*P(p_H=0.5)}{p(HH)}$.

In Bayesian inference, $P(HH|p_H=0.5)$ is always referred to as likelihood where as $P(p_H=0.5)$ is referred to as prior and $P(p_H=0.5|HH)$ is referred as posterior probability.

Given the formula above, obviously wiki cannot claim $P(p_H=0.5|HH) = P(HH|p_H=0.5)$ so I guess that is why it uses likelihood here.

Undecided
  • 119
  • 4