607

The wikipedia page claims that likelihood and probability are distinct concepts.

In non-technical parlance, "likelihood" is usually a synonym for "probability," but in statistical usage there is a clear distinction in perspective: the number that is the probability of some observed outcomes given a set of parameter values is regarded as the likelihood of the set of parameter values given the observed outcomes.

Can someone give a more down-to-earth description of what this means? In addition, some examples of how "probability" and "likelihood" disagree would be nice.

hippietrail
  • 107
  • 5
Douglas S. Stones
  • 6,931
  • 4
  • 16
  • 18
  • 25
    Great question. I would add "odds" and "chance" in there too :) – Neil McGuigan Sep 14 '10 at 05:28
  • 6
    I think you should take a look at this question http://stats.stackexchange.com/questions/665/whats-the-difference-between-probability-and-statistics/675#675 because Likelihood is for statistic purpose and probability for probability. – robin girard Sep 14 '10 at 06:04
  • 4
    Wow, these are some _really_ good answers. So a big thanks for that! Some point soon, I'll pick one I particularly like as the "accepted" answer (although there are several that I think are equally deserved). – Douglas S. Stones Sep 15 '10 at 01:13
  • 1
    Also note that the "likelihood ratio" is actually a "probability ratio" since is a function of the observations. – JohnRos Nov 02 '11 at 10:29

12 Answers12

437

The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English.

Discrete Random Variables

Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes etc). In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is $p$ and that coin tosses are independent).

Denote the observed outcomes by $O$ and the set of parameters that describe the stochastic process as $\theta$. Thus, when we speak of probability we want to calculate $P(O|\theta)$. In other words, given specific values for $\theta$, $P(O|\theta)$ is the probability that we would observe the outcomes represented by $O$.

However, when we model a real life stochastic process, we often do not know $\theta$. We simply observe $O$ and the goal then is to arrive at an estimate for $\theta$ that would be a plausible choice given the observed outcomes $O$. We know that given a value of $\theta$ the probability of observing $O$ is $P(O|\theta)$. Thus, a 'natural' estimation process is to choose that value of $\theta$ that would maximize the probability that we would actually observe $O$. In other words, we find the parameter values $\theta$ that maximize the following function:

$L(\theta|O) = P(O|\theta)$

$L(\theta|O)$ is called the likelihood function. Notice that by definition the likelihood function is conditioned on the observed $O$ and that it is a function of the unknown parameters $\theta$.

Continuous Random Variables

In the continuous case the situation is similar with one important difference. We can no longer talk about the probability that we observed $O$ given $\theta$ because in the continuous case $P(O|\theta) = 0$. Without getting into technicalities, the basic idea is as follows:

Denote the probability density function (pdf) associated with the outcomes $O$ as: $f(O|\theta)$. Thus, in the continuous case we estimate $\theta$ given observed outcomes $O$ by maximizing the following function:

$L(\theta|O) = f(O|\theta)$

In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe $O$ as we maximize the PDF associated with the observed outcomes $O$.

  • 50
    The distinction between discrete and continuous variables disappears from the point of view of measure theory. – whuber Sep 14 '10 at 15:48
  • 34
    @whuber yes but an answer using measure theory is not that accessible to everyone. –  Sep 14 '10 at 20:09
  • 21
    @Srikant: Agreed. The comment was for the benefit of the OP, who is a mathematician (but perhaps not a statistician) to avoid being misled into thinking there is something fundamental about the distinction. – whuber Sep 14 '10 at 20:36
  • 10
    You can interpret a continuous density the same as the discrete case if $O$ is replaced by $dO$, in the sense that if we ask for $Pr(O\in(O',O'+dO') |\theta)$ (i.e. probability that the data $O$ is contained in an infinintesimal region about $O'$) and the answer is $f(O'|\theta)dO'$ (the $dO'$ makes this clear that we are calculating the area of an infinintesimaly thin "bin" of a histogram). – probabilityislogic Jan 28 '11 at 13:40
  • 13
    I am over 5 years late to the party, but I think that a very crucial follow-up to this answer would be http://stats.stackexchange.com/questions/31238/what-is-the-reason-that-a-likelihood-function-is-not-a-pdf which stresses upon the fact that likelihood function $L(\theta)$ is not a pdf with respect to $\theta$. $L(\theta$) is indeed a pdf of data given the parameter value, but since the since $L$ is a function of $\theta$ alone (with data held as a constant), it is irrelevant that $L(\theta)$ is a pdf of data given $\theta$. – Shobhit Jan 08 '16 at 16:04
  • 1
    @whuber Very interesting comment regarding measure theory. I'm sure I'm not alone in, 1) having no background in this area and, 2) being curious about how your statement is to be understood. Are there some useful texts you can recommend? – Mike Hunter Jun 29 '16 at 16:22
  • 6
    @DJohnson The gentlest and most intuitive, yet rigorous, introduction I have seen is that in Steven Shreve's *Stochastic Calculus for Finance,* Volume II. What you need is in the first dozen pages. If that looks mathematically too heavy, then study Volume I. If you have any interest in applications of probability to finance, then Volume I is well worth your time--and it's wonderfully brief. – whuber Jun 29 '16 at 16:32
  • You wrote $L(\theta|O) = P(O|\theta)$ does that mean $P(O|\theta)$ doesnt integrate to 1 because it is not probability? Then why do we use $P$ here, we should keep it to be $L$ otherwise it is confusing. – GENIVI-LEARNER Jan 23 '20 at 13:35
  • 1
    OP: *Can someone give a more down-to-earth description*. This answer : *here on Cloud 11 you can see..* – d-_-b Jul 18 '20 at 05:26
  • @GENIVI-LEARNER: It is a probability, but conditional probability. – MSIS Sep 12 '21 at 21:53
  • @MSIS conditional probability does integrate to 1 – GENIVI-LEARNER Sep 17 '21 at 21:36
  • @GENIVI-LEARNER: I understand/agree. I meant that only in Bayesian is $\theta$ a Random Variable. I understand the definition of Likelihod is the probability of obtaining certain sample data in function of some parameters of the population. For us to define a standard conditional probability, we would need the parameter to be a RV. But this hapens only in Bayesian and not in frequentist. But I may be wrong here. – MSIS Sep 19 '21 at 22:01
178

This is the kind of question that just about everybody is going to answer and I would expect all the answers to be good. But you're a mathematician, Douglas, so let me offer a mathematical reply.

A statistical model has to connect two distinct conceptual entities: data, which are elements $x$ of some set (such as a vector space), and a possible quantitative model of the data behavior. Models are usually represented by points $\theta$ on a finite dimensional manifold, a manifold with boundary, or a function space (the latter is termed a "non-parametric" problem).

The data $x$ are connected to the possible models $\theta$ by means of a function $\Lambda(x, \theta)$. For any given $\theta$, $\Lambda(x, \theta)$ is intended to be the probability (or probability density) of $x$. For any given $x$, on the other hand, $\Lambda(x, \theta)$ can be viewed as a function of $\theta$ and is usually assumed to have certain nice properties, such as being continuously second differentiable. The intention to view $\Lambda$ in this way and to invoke these assumptions is announced by calling $\Lambda$ the "likelihood."

It's quite like the distinction between variables and parameters in a differential equation: sometimes we want to study the solution (i.e., we focus on the variables as the argument) and sometimes we want to study how the solution varies with the parameters. The main distinction is that in statistics we rarely need to study the simultaneous variation of both sets of arguments; there is no statistical object that naturally corresponds to changing both the data $x$ and the model parameters $\theta$. That's why you hear more about this dichotomy than you would in analogous mathematical settings.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 8
    +1, what a cool answer. Analogy with differential equations seems very apropriate. – mpiktas Mar 05 '12 at 20:15
  • 3
    As an economist, although this answer does not relate as closely as the previous to the concepts I've learnt, it was the most informative one in an intuitive sense. Many thanks. – Robson Jan 20 '16 at 14:25
  • 1
    Actually, this statement is not really true "there is no statistical object that naturally corresponds to changing both the data x and the model parameters θ.". There is, it's called "smoothing, filtering, and prediction", in linear models its the Kalman filter, in nonlinear models, they have the full nonlinear filters, https://en.wikipedia.org/wiki/Kushner_equation etc – crow Nov 17 '17 at 18:26
  • 2
    Yes, great answer! As lame as this sounds, by choosing $\Lambda\left(x, \theta\right)$ instead of the standard notation of $P\left(x, \theta\right)$, it made it easier for me to see that we're starting off with a joint probability that can be defined as either a likelihood or a conditional probability. Plus, the "certain nice properties" comment helped. Thanks! – Mike Williamson Aug 18 '18 at 19:20
  • @Mike You're welcome. But please note that $\Lambda$ is not usually a "joint probability" except in Bayesian models. I hope my account wasn't confusing about that. – whuber Aug 18 '18 at 21:36
  • 3
    @whuber Yes, I know $\Lambda$ isn't the usual notation. That's exactly why it helped! I stopped thinking that it must have a particular meaning and instead just followed the logic. ;-p – Mike Williamson Aug 19 '18 at 15:14
  • Is $\Lambda$ here considered a conditional probability distribution (conditioned on the parameters $\theta$)? – 24n8 Jun 26 '20 at 23:38
  • 1
    @Iamanon Not necessarily: it parameterizes a family of probability distributions. You may think of it as a function (continuous, at least) from a parameter space $\Theta$ into a space of probability distributions, taking $\theta\in\Theta$ to the distribution with density $x\to\Lambda(x,\theta).$ This requires a common measure with respect to which all the distributions actually have a density. – whuber Jun 27 '20 at 12:15
  • @whuber: Still, isn't the likelihood the probability of obtaining a sample $x_1,x_2,..,x_n$ in terms of some population parameters? – MSIS Sep 19 '21 at 21:47
  • 1
    @MSIS That captures the idea. But note that the likelihood is often a *density,* not a probability. Another distinction is that "in terms of some population parameters" is not the same thing as "conditional distribution." The principal difference is that "conditional distribution" implies there is a joint probability distribution for the parameters and the data. In many applications no such thing is used in the model. – whuber Sep 20 '21 at 12:47
147

I'll try and minimise the mathematics in my explanation as there are some good mathematical explanations already.

As Robin Girard comments, the difference between probability and likelihood is closely related to the difference between probability and statistics. In a sense probability and statistics concern themselves with problems that are opposite or inverse to one another.

Consider a coin toss. (My answer will be similar to Example 1 on Wikipedia.) If we know the coin is fair ($p=0.5$) a typical probability question is: What is the probability of getting two heads in a row. The answer is $P(HH) = P(H)\times P(H) = 0.5\times0.5 = 0.25$.

A typical statistical question is: Is the coin fair? To answer this we need to ask: To what extent does our sample support the our hypothesis that $P(H) = P(T) = 0.5$?

The first point to note is that the direction of the question has reversed. In probability we start with an assumed parameter ($P(head)$) and estimate the probability of a given sample (two heads in a row). In statistics we start with the observation (two heads in a row) and make INFERENCE about our parameter ($p = P(H) = 1- P(T) = 1 - q$).

Example 1 on Wikipedia shows us that the maximum likelihood estimate of $P(H)$ after 2 heads in a row is $p_{MLE} = 1$. But the data in no way rule out the the true parameter value $p(H) = 0.5$ (let's not concern ourselves with the details at the moment). Indeed only very small values of $p(H)$ and particularly $p(H)=0$ can be reasonably eliminated after $n = 2$ (two throws of the coin). After the third throw comes up tails we can now eliminate the possibility that $P(H) = 1.0$ (i.e. it is not a two-headed coin), but most values in between can be reasonably supported by the data. (An exact binomial 95% confidence interval for $p(H)$ is 0.094 to 0.992.

After 100 coin tosses and (say) 70 heads, we now have a reasonable basis for the suspicion that the coin is not in fact fair. An exact 95% CI on $p(H)$ is now 0.600 to 0.787 and the probability of observing a result as extreme as 70 or more heads (or tails) from 100 tosses given $p(H) = 0.5$ is 0.0000785.

Although I have not explicitly used likelihood calculations this example captures the concept of likelihood: Likelihood is a measure of the extent to which a sample provides support for particular values of a parameter in a parametric model.

Joffer
  • 103
  • 3
Thylacoleo
  • 4,829
  • 5
  • 24
  • 32
  • 3
    Great answer! Especially the three last paragraphs are very useful. How would you extend this to describe the continuous case? – Demetris Sep 02 '14 at 11:53
  • 9
    For me, best answer. I don't mind math at all, but *for me* math is a *tool* ruled by what I want (I don't enjoy math for its own sake, but for what it helps me do). Only with this answer do I know the latter. – Mörre Apr 20 '15 at 13:28
82

Given all the fine technical answers above, let me take it back to language: Probability quantifies anticipation (of outcome), likelihood quantifies trust (in model).

Suppose somebody challenges us to a 'profitable gambling game'. Then, probabilities will serve us to compute things like the expected profile of your gains and loses (mean, mode, median, variance, information ratio, value at risk, gamblers ruin, and so on). In contrast, likelihood will serve us to quantify whether we trust those probabilities in the first place; or whether we 'smell a rat'.


Incidentally -- since somebody above mentioned the religions of statistics -- I believe likelihood ratio to be an integral part of the Bayesian world as well as of the frequentist one: In the Bayesian world, Bayes formula just combines prior with likelihood to produce posterior.

Gypsy
  • 821
  • 6
  • 2
  • 3
    This answer sums it up for me. I had to think through what it meant when I read that likelihood is not probability, but the following case occurred to me. What is the likelihood that a coin is fair, given that we see four heads in a row? We can't really say anything about probability here, but the word "trust" seems apt. Do we feel we can trust the coin? – dnuttle Jul 23 '18 at 12:21
  • 2
    Initially this might have been the historically intended purpose of likelihoods, but nowadays likelihoods are every bayesian calculation, and it's known that probabilities can amalgamate beliefs and plausibility, which is why the Dempster-Shafer theory was created, to disambiguate both interpretations. – gaborous Sep 03 '19 at 08:39
  • 1
    Great answer!! Thank you so much! – coderina Jun 11 '21 at 04:35
77

I will give you the perspective from the view of Likelihood Theory which originated with Fisher -- and is the basis for the statistical definition in the cited Wikipedia article.

Suppose you have random variates $X$ which arise from a parameterized distribution $F(X; \theta)$, where $\theta$ is the parameter characterizing $F$. Then the probability of $X = x$ would be: $P(X = x) = F(x; \theta)$, with known $\theta$.

More often, you have data $X$ and $\theta$ is unknown. Given the assumed model $F$, the likelihood is defined as the probability of observed data as a function of $\theta$: $L(\theta) = P(\theta; X = x)$. Note that $X$ is known, but $\theta$ is unknown; in fact the motivation for defining the likelihood is to determine the parameter of the distribution.

Although it seems like we have simply re-written the probability function, a key consequence of this is that the likelihood function does not obey the laws of probability (for example, it's not bound to the [0, 1] interval). However, the likelihood function is proportional to the probability of the observed data.

This concept of likelihood actually leads to a different school of thought, "likelihoodists" (distinct from frequentist and bayesian) and you can google to search for all the various historical debates. The cornerstone is the Likelihood Principle which essentially says that we can perform inference directly from the likelihood function (neither Bayesians nor frequentists accept this since it is not probability based inference). These days a lot of what is taught as "frequentist" in schools is actually an amalgam of frequentist and likelihood thinking.

For deeper insight, a nice start and historical reference is Edwards' Likelihood. For a modern take, I'd recommend Richard Royall's wonderful monograph, Statistical Evidence: A Likelihood Paradigm.

ars
  • 12,160
  • 1
  • 36
  • 54
  • 3
    Interesting answer, I actually thought that the "likelihood school" was basically the "frequentists who don't design samples school", while the "design school" was the rest of the frequentists. I actually find it hard myself to say which "school" I am, as I have a bit of knowledge from every school. The "Probability as extended logic" school is my favourite (duh), but I don't have enough practical experience in applying it to real problems to be dogmatic about it. – probabilityislogic Jan 28 '11 at 13:53
  • 5
    +1 for "the likelihood function does not obey the laws of probability (for example, it's not bound to the [0, 1] interval). However, the likelihood function is proportional to the probabiilty of the observed data." – Walrus the Cat Jun 13 '14 at 22:27
  • 11
    "the likelihood function does not obey the laws of probability" could use some further clarification, especialy since is was written as θ: L(θ)=P(θ;X=x), i.e. equated with a probability! – redcalx Apr 03 '15 at 19:53
  • Thanks for your answer. Could you please address the comment that @locster made? – Vivek Subramanian Jul 01 '15 at 11:48
  • 4
    To me as a not mathematician, this reads like religious mathematics, with different beliefs resulting in different values for chances of events to occur. Can you formulate it, so that it is easier to understand what the different beliefs are and why they all make sense, instead of one being simply incorrect and the other school / belief being correct? (assumption that there is *one correct way* of calculating chances for events to occur) – Zelphir Kaltstahl Jul 26 '16 at 13:11
66

If I have a fair coin (parameter value) then the probability that it will come up heads is 0.5. If I flip a coin 100 times and it comes up heads 52 times then it has a high likelihood of being fair (the numeric value of likelihood potentially taking a number of forms).

John
  • 21,167
  • 9
  • 48
  • 84
62

Suppose you have a coin with probability $p$ to land heads and $(1-p)$ to land tails. Let $x=1$ indicate heads and $x=0$ indicate tails. Define $f$ as follows

$$f(x,p)=p^x (1-p)^{1-x}$$

$f(x,2/3)$ is probability of x given $p=2/3$, $f(1,p)$ is likelihood of $p$ given $x=1$. Basically likelihood vs. probability tells you which parameter of density is considered to be the variable

Glorfindel
  • 700
  • 1
  • 9
  • 18
Yaroslav Bulatov
  • 5,167
  • 2
  • 24
  • 38
  • 1
    Nice complement to the theoretical definitions used above! – Frank Meulenaar Sep 17 '11 at 10:47
  • I see that $C^n_kp^n(1-p)^{k-n}$ gives the probability of having $n$ heads in $k$ trials. Your $p^x(1-p)^{1-x}$ looks like $k$-th root of that: $x=n/k$. What does it mean? – Little Alien Sep 01 '16 at 13:29
  • @LittleAlien what is $C_k^n$ in your equation? – GENIVI-LEARNER Jan 25 '20 at 01:02
  • 1
    @GENIVI-LEARNER $C^n_k$ is the binomial coefficient (see https://en.wikipedia.org/wiki/Binomial_coefficient). It allows you to calculate the probability of seeing different combinations of heads and tails (for example: $HTT$, $THT$, $TTH$ for $n=3$, $k=1$), instead of all heads or all tails using the simpler $f(x,p)=p^x(1-p)^{n-k}$ formula. – RobertF Apr 22 '20 at 17:31
36

$P(x|\theta)$ can be seen from two points of view:

  • As a function of $x$, treating $\theta$ as known/observed. If $\theta$ is not a random variable, then $P(x|\theta)$ is called the (parameterized) probability of $x$ given the model parameters $\theta$, which is sometimes also written as $P(x;\theta)$ or $P_{\theta}(x)$. If $\theta$ is a random variable, as in Bayesian statistics, then $P(x|\theta)$ is a conditional probability, defined as ${P(x\cap\theta)}/{P(\theta)}$.
  • As a function of $\theta$, treating $x$ as observed. For example, when you try to find a certain assignment $\hat\theta$ for $\theta$ that maximizes $P(x|\theta)$, then $P(x|\hat\theta)$ is called the maximum likelihood of $\theta$ given the data $x$, sometimes written as $\mathcal L(\hat\theta|x)$. So, the term likelihood is just shorthand to refer to the probability $P(x|\theta)$ for some data $x$ that results from assigning different values to $\theta$ (e.g. as one traverses the search space of $\theta$ for a good solution). So, it is often used as an objective function, but also as a performance measure to compare two models as in Bayesian model comparison.

Often, this expression is still a function of both its arguments, so it is rather a matter of emphasis.

Lenar Hoyt
  • 883
  • 1
  • 8
  • 15
  • For the second case, I thought people usually write P(theta|x). – yuqian Jan 04 '16 at 20:40
  • Originally intuitively I already thought they're both words for the same with a difference in perspective or natural language formulation, so I feel like "What? I was right all along?!" But if this is the case, why is distinguishing them so important? English not being my mother tongue, I grew up with only one word for seemingly both of the terms (or have I simply never gotten a problem where I needed to distinguish the terms?) and never knew there was any difference. It's only now, that I know two English terms, that I begin to doubt my understanding of these things. – Zelphir Kaltstahl Jul 26 '16 at 13:32
  • 3
    Your answer seems to be very comrphensive and is easy to understand. I wonder, why it got so few upvotes. – Funkwecker Feb 16 '17 at 07:18
  • 4
    Note that P(x|$\theta$) is a **conditional** probability only if $\theta$ is a random variable, if $\theta$ is a parameter then it's simply the probability of x parameterized by $\theta$. – Mircea Mironenco May 09 '17 at 19:49
  • 1
    i think this is the best answer amongst all – aerin Oct 11 '17 at 07:08
8

do you know the pilot to the tv series "num3ers" in which the FBI tries to locate the home base of a serial criminal who seems to choose his victims at random?

the FBI's mathematical advisor and brother of the agent in charge solves the problem with a maximum likelihood approach. first, he assumes some "gugelhupf shaped" probability $p(x|\theta)$ that the crimes take place at locations $x$ if the criminal lives at location $\theta$. (the gugelhupf assumption is that the criminal will neither commit a crime in his immediate neighbourhood nor travel extremely far to choose his next random victim.) this model describes the probabilities for different $x$ given a fixed $\theta$. in other words, $p_{\theta}(x)=p(x|\theta)$ is a function of $x$ with a fixed parameter $\theta$.

of course, the FBI doesn't know the criminal's domicile, nor does it want to predict the next crime scene. (they hope to find the criminal first!) it's the other way round, the FBI already knows the crime scenes $x$ and wants to locate the criminal's domicile $\theta$.

so the FBI agent's brilliant brother has to try and find the most likely $\theta$ among all values possible, i.e. the $\theta$ which maximises $p(x|\theta)$ for the actually observed $x$. therefore, he now considers $l_x(\theta)=p(x|\theta)$ as a function of $\theta$ with a fixed parameter $x$. figuratively speaking, he shoves his gugelhupf around on the map until it optimally "fits" the known crime scenes $x$. the FBI then goes knocking on the door in the center $\hat{\theta}$ of the gugelhupf.

to stress this change of perspective, $l_x(\theta)$ is called the likelihood (function) of $\theta$, whereas $p_{\theta}(x)$ was the probability (function) of $x$. both are actually the same function $p(x|\theta)$ but seen from different perspectives and with $x$ and $\theta$ switching their roles as variable and parameter, respectively.

schotti
  • 161
  • 1
  • 9
5

As far as I'm concerned, the most important distinction is that likelihood is not a probability (of $\theta$).

In an estimation problem, the X is given and the likelihood $P(X|\theta)$ describes a distribution of X rather than $\theta$. That is, $\int P(X|\theta) d\theta$ is meaningless, since likelihood is not a pdf of $\theta$, though it does characterize $\theta$ to some extent.

Response777
  • 147
  • 2
  • 4
  • 1
    As the answer from @Lenar Hoyt points out, if theta is a random variable (which it can be), then likelihood is a probability. So the real answer seems to be that the likelihood can be a probability, but is sometimes not. – Mike Wise Dec 05 '17 at 17:47
  • @MikeWise, I think theta could always be viewed as a "random" variable, while chances are that it is just not so "random"... – Response777 Dec 06 '17 at 15:18
1

If we put the conditional probability interpretation aside, you can think it in this way:

  • In probability you usually want to find the probability of a possible event based on a model/parameter/probability distribution, etc.

  • In likelihood you have observed some outcome, so you want to find/create/estimate the most likely source/model/parameter/probability distribution from which this event has raised.

Ahmad
  • 469
  • 3
  • 14
  • 1
    This seems to me to miss the point completely. Probability and likelihood are not to be distinguished in this way. (My edits are only linguistic.) – Nick Cox Nov 24 '19 at 12:18
  • @NickCox What's the problem? it's just an intuition, not a formal answer, other gave the formal answers. – Ahmad Nov 24 '19 at 12:20
  • @NickCox I modified it a bit, please check it again. – Ahmad Nov 24 '19 at 12:35
  • 1
    Sorry, but formal or informal style isn’t the issue. The distinction isn’t in terms of past and future. This only adds confusion to the thread, and I have downvoted if as wrong. – Nick Cox Nov 24 '19 at 12:40
  • 1
    @NickCox I'm not an statistician, but isn't probability about events we don't know the result **beforehand**? and likelihood about observations? And observation is an event has occurred! I really myself don't want to be very pedantic, just an intuition that works in most situations. – Ahmad Nov 24 '19 at 12:44
  • 1
    The thread already has several excellent, much upvoted answers. That is not a situation in which anyone not confident of their expertise need or should add another. Any interest in the future is not the issue as in practice both probability and likelihood are calculated from data already to hand. – Nick Cox Nov 24 '19 at 13:45
  • @NickCox I read them, but it was the way I found it more intuitive for myself and I post my answer for those who may view the situation from my view point. Yes, we have both data, these are just for calculating probability, they aren't for the usage of pdf. The way you think about probability is different from the way you think about likelihood. That's my point. Simply as I wrote probability is the probability of an event. How probable is that event to occur; the degree of our uncertainty for an outcome (something we can't predict deterministicly). – Ahmad Nov 24 '19 at 15:45
  • @NickCox but for likelihood we aren't concerned about an event to occur and how probable it is, the event already has happened and left some observation for us to speculate the underneath probabilistic process. – Ahmad Nov 24 '19 at 15:46
  • 2
    -1 Intuitive answers are good--when they are correct. This one just is misleading and wrong. – whuber Nov 24 '19 at 22:25
  • @whuber another person also had your idea, but I tried to convince him, and he didn't add anything. Please read our discussion and you if had any thing to add would be glad to hear. Just say something is wrong doesn't make it wrong. However, my purpose isn't to give a formal or full answer. Just a quick hint and leaving the justification to the user. So, you also are free to get the point or try to be punctual. – Ahmad Nov 25 '19 at 06:03
  • @whuber by the way, I changed some words according to more technical ones for those who may be mislead and don't get the relation – Ahmad Nov 25 '19 at 06:09
  • @whuber I tried to improve my answer but it totally lost all the intuitive points to be another tasteless definition, so I am going to remove it. I just can say you're website is very discouraging and prim. – Ahmad Nov 25 '19 at 06:42
  • 1
    I am not happy that you have a negative impression of our site but discouraging wrong or irrelevant answers is part of how the site works, unfortunately for you in this case. The record of my comments can stand for any other readers as trying to explain concisely how your answer failed to help. – Nick Cox Nov 25 '19 at 07:17
  • @NickCox thank you for understanding! No problem. I know you also do your job with good intention. I learned some points, and I had to put more time on my answer, however my focus was just to give a new even not precise perspective rather than repeat obvious or common interpretations but it turned out to be cumbersome. Anyway, thank you – Ahmad Nov 25 '19 at 07:37
  • 1
    You have the directionality wrong, Ahmad: a wrong answer justifies stating it is wrong. To understand why your post is wrong--since @Nick's responses haven't sufficed--all you have to do is refer to an authority for definitions or descriptions of likelihood and probability. (You will have a hard time finding one that makes your temporal distinction, though, because neither probability nor likelihood make any distinction among past, present, or future.) Reading the other answers in this thread would be a good start. – whuber Nov 25 '19 at 14:15
  • @whuber "answer" is a vague object for the term wrong. Anyways, based on your logic, I just can say you are wrong! you didn't get the point and it's not about past, future, etc.. If you want to know why you can read the discussion between me and Nick Cox. I explained that enough! However I'm going to delete my answer. – Ahmad Nov 25 '19 at 18:17
  • I understand why you still feel very sore about this exchange, but that is not an excuse for being impolite to @whuber. He clearly has read my comments, as he refers to them, and it's absurd to imply that he is too stupid or ignorant to see your point. Even revised drastically your answer raises more problems than it solves. At the outset, the characterisation that "In probability you usually speculate the probability of a possible event" at most refers to prior probability and does not serve as helpful generally. I stop there. – Nick Cox Nov 26 '19 at 08:58
  • 1
    @whuber, NickCox, sorry, I think I was impulsive in my previous comment and I didn't notice some hints you provided. First my impulse was due to the first sentence, Yes, "a wrong answer justifies stating it is wrong, but stating something is wrong doesn't imply it's wrong or right". Anyway, I don't like to argue, I prefer to learn and I thought you didn't offer your reasoning. However, now I see the word "temporal distinction" was a clue. It's something that can be discussed. – Ahmad Nov 26 '19 at 12:13
  • 1
    And my distinction was more as a rule of thumb/hint/(I don't know the phrase) to ease distinguishing them for a non-expert.However, I agree it needs a more precise terminology. I may revise or remove my answer later, however at this point I should leave it there. Thanks. – Ahmad Nov 26 '19 at 12:18
  • it's your choice, unless further votes for deletion decide the matter.. At present you have two downvotes (@whuber and I declared ourselves) and one upvote from somebody quite different – Nick Cox Nov 26 '19 at 13:01
  • @NickCox, does the current modification to this answer makes it "somewhat correct"? In my opinion i think it does because the likelihood is to determine what are the array of hypothesis of an observed outcome and maximum likelihood is set to determine a single hypothesis that best explains the outcome. In this context he wrote "most likely" to define likelihood and not maximum likelihood which I think is the "only" misleading concept here. Right? – GENIVI-LEARNER Jan 25 '20 at 01:28
  • @GENIVI-LEARNER I don't think it is yet a helpful answer. Neither quantity is well defined by the attitude you supposedly have when you use it. For example, probabilities can be estimated descriptively without any formal model in mind. – Nick Cox Jan 25 '20 at 08:38
  • @NickCox, Well looks that likelihood is little more intricate then. You have good insight on it, please see if you could contribute to [this as well](https://stats.stackexchange.com/questions/445928/probability-and-likelihood-from-another-angle). I re-framed it with a concrete scenario. – GENIVI-LEARNER Jan 25 '20 at 12:58
  • 1
    @NickCox, Also when you said **probabilities can be estimated descriptively without any formal model in mind** are you saying that for likelihood there must always be a hypothesis or some analytical or numerical model at hand? If so then I think this model-based aspect of likelihood can uniquely define how likelihood is different from probability. Right? – GENIVI-LEARNER Jan 25 '20 at 13:24
  • My stance remains that there are several extraordinarily good answers here and I have nothing to say that is different or would be better phrased. There’s empirical likelihood too, not that I know enough about it to post. – Nick Cox Jan 25 '20 at 15:05
  • 1
    I'm not sure what the edit history is at this point, but do people still think that this answer is wrong? I'm also not a statistician, but it looks like a pretty straightforward and correct answer to me. – SuperCodeBrah Jan 13 '21 at 03:44
-2

Likelihood is bound to the statistical model that you have chosen. Let's take a discrete example, and assume you have a single observation. Hypothetically, you could always choose a statistical model that always produces one outcome, the observation that you have, with probability $1$, hence, the likelihood will also be $1$. This would be a bad model but it would fit your data perfectly. So, likelihood, in essence, is a subjective value because it depends on how you want to model your data.

PS: Above is the case when you have a single observation. Similar example can be provided for the case when you have multiple observations, i.e. data. Again, hypothetically, you can restrict your model in such a way that it produces only from the observations that you have. For example, say your observations are the following coin flips, TTHH, TTHT, TTTH, TTTT, then you can restrict the model so that it always produces TT as the first two flips. This will be your assumption, hence it is subjective, and the likelihood you get will be higher than if you had not imposed that restriction.

  • But given observed data which takes on values other than that single outcome, if your model places all probability mass on one observation, then its likelihood is zero. In what sense is likelihood subjective? – Arya McCarthy May 02 '21 at 22:34
  • Right, I had to clarify initially that I was dealing with the case of single observation, which is a very rare case. So, I provided example on the case of multiple observations. – Sobir Bobiev May 02 '21 at 22:57