Probability of a sample being drawn from a continuous distribution is zero; so how to choose a more likely distribution?

Question

I have a sample $X$ and two normal distributions $\mathcal{N}_1$ and $\mathcal{N}_2$. I would like to determine from which of these distributions $X$ was more likely sampled.

However, $p(x | \mathcal{N}_1)$ and $p(x | \mathcal{N}_2)$ are both $0$ as the normal distribution is continuous. A sneaky (but maybe wrong?) work around would be to define a small $\epsilon$ and integrating from $x-\epsilon$ to $x+\epsilon$ under both distributions and using that as the respective probability of generating the sample $X$.

Is this a correct approach or should I be doing something else?

If the population distribution is continuous, then indeed, the probability of drawing a particular sample $x$ is 0. The usual way around this is to define likelihood as a probability density rather than a probability. So in your case, using the maximum-likelihood principle, you would choose the model for which the density at $x$ is the largest. — StijnDeVuyst, Sep 25 '17 at 09:56
Note that using a small $\pm \epsilon$ region around $x$ (which is what you were thinking about) is conceptually exactly equivalent to using the probability density value directly, as per Tim's answer and the above comment. So your thinking was exactly right, it's just that you don't explicitly need to bother with epsilons. — amoeba, Sep 25 '17 at 12:07

Tim · Accepted Answer · 2017-09-25T08:38:10.570

4

Your approach is not correct. For a moment let's forget about the distributions and simplify to asking about simpler question: given $X$, what is the probability that it comes from the class $C_i$?, i.e. $p(C_i | X)$, while what you propose is looking at the probability that $X=x$ given that it comes from $C_i$. Those are two different things.

To calculate the probability that you are interested in, you would need to use Bayes theorem

$$ p(C_i | X) = \frac{p(X | C_i) \,p(C_i)}{\sum_j p(X | C_j) \,p(C_j)} $$

so you would need to assume some prior for $P(C_i)$, i.e. the probability of observing samples from class $C_i$.

By looking only at the likelihood $p(X | C_i) $ you cannot tell the probability you are interested in, you can only say that there is a greater likelihood of observing one option as compared to another and for this there is no problem in dealing with probability densities, since you look only at their relative sizes. If you are not interested in the probabilities, but only in deciding from which class your sample might have come from, you may use likelihood-ratio test.

edited Sep 25 '17 at 08:38

answered Sep 25 '17 at 08:23

Tim

108,699
20
212
390

Thanks Tim. The quantity $p(C_i|X)$ is what I am actually trying to compute (using Bayes Formula). However in my case $p(X|C_i)=\mathcal{N}(x|\mu,\sigma)$ (from this tutorial https://brilliant.org/wiki/gaussian-mixture-model/) which I assume to be the probability of sampling $X$ given the normal distribution with parameters $\mu,\sigma$ (hence my question). How do I compute $\mathcal{N}(x|\mu,\sigma)$? – Luke Taylor Sep 25 '17 at 08:43
@LukeTaylor $\mathcal{N}(x|\mu,\sigma)$ is just a *probability density* under normal distribution, [there is no problem](https://stats.stackexchange.com/questions/275198/question-about-bayesian-theory-with-mixed-discrete-and-continuous-variables) with applying Bayes theorem to mixed discrete-continuous variables. – Tim Sep 25 '17 at 08:45
1

I see, so $\mathcal{N}(x|\mu,\,\sigma)$ is not a probability but rather the value that the function (of the distribution) takes on at the point $x$. Thanks a lot! – Luke Taylor Sep 25 '17 at 09:14
2

@LukeTaylor yes, it is a [probability density function](https://stats.stackexchange.com/questions/4220/can-a-probability-distribution-value-exceeding-1-be-ok). – Tim Sep 25 '17 at 09:17
1

I feel like this answer does not really address the OP's confusion; it's only the last several comments above that do. – amoeba Sep 25 '17 at 12:08
OP wasn't exactly asking the right question in the first place. – White Shirt Sep 25 '17 at 18:10

score 4 · Answer 2 · answered Sep 25 '17 at 12:13

The purpose of this answer is simply to expand on the answer by @Tim.

Suppose the likelihood of the parameters given the sample can be expressed as \begin{equation} p(X|\theta) = \prod_{i=1}^n \textsf{N}(x_i|\mu,\sigma^2) , \end{equation} where $X = (x_1, \ldots, x_n)$ is the sample and $\theta = (\mu,\sigma^2)$ are the parameters. Then in general the likelihood of model $j$ (i.e., class $j$) can be expressed as \begin{equation} p(X|C_j) = \int p(X|\theta)\,p(\theta|C_j)\,d\theta , \end{equation} where $p(\theta|C_j)$ is the distribution of $\theta$ given model $j$.

This general approach can be specialized to the current case as follows. Let \begin{equation} p(\theta|C_j) = \delta(\mu-\mu_j)\,\delta(\sigma^2 - \sigma_j^2) , \end{equation} where $\delta(x)$ is the Dirac delta "function." In effect, this distribution puts a point mass at $\theta_j = (\mu_j,\sigma_j^2)$. The two salient properties of the Dirac delta function are $\int \delta(x-x_0)\,dx = 1$ and $\int f(x)\,\delta(x-x_0)\,dx = f(x_0)$.

With this point-mass distribution, we can compute the desired expression: \begin{equation} \begin{split} p(X|C_j) &= \iint p(X|\mu,\sigma^2)\,\delta(\mu-\mu_j)\,\delta(\sigma^2-\sigma_j^2)\,d\mu\,d\sigma^2 \\ &= p(X|\theta_j) \\ &= \prod_{i=1}^n \textsf{N}(x_i|\mu_j,\sigma_j^2) . \end{split} \end{equation}

Probability of a sample being drawn from a continuous distribution is zero; so how to choose a more likely distribution?

2 Answers2