5

I am doing the CS229:Machine Learning of Stanford Engineering Everywhere. All trhough the first chapter he uses

$$L(\theta) = P(Y | X; \theta)$$

i.e. the likelihood of the parameter $\theta$ is given by the cond. prob. of Y given X

Now in the second chapter, when talking about Gaussian Discriminant Analysis, shuddenly without any explaination our likelihood looks like this:

$$L(\theta) = P(Y \cap X; \theta)$$

What happened here? Which likelihood function is used when? I find the first likelihood a much more natural choice.

I am talking about page 10 of this script

Xi'an
  • 90,397
  • 9
  • 157
  • 575
fubal
  • 165
  • 5
  • **Conventions:** The notation $Y\cup X$ is quite unusual since $X$ and $Y$ are random variables. A more natural notation would be $P(Y,X|\theta)$. Note also that capital letters are usually reserved for random variables, which realisations are denoted with lower case letters. – Xi'an Oct 01 '15 at 16:49

3 Answers3

5

The two likelihoods are related by the following equation: $$P(Y \cap X\,|\,\Theta) = P(Y\,|\,X,\Theta)P(X\,|\,\Theta)$$ So, the joint probability of $Y$ and $X$ has to account for two things:

  1. The probability of generating $Y$ given $X$ and $\Theta$
  2. The probability of generating $X$ given $\Theta$

$P(Y\,|\,X,\Theta)$ only accounts for (1), and would be preferred when you only care about predicting Y when X is known. The joint likelihood looks at the probability of generating both X and Y given model parameter $\Theta$. This could be valuable if you want your model to predict X as well as Y given X. Put another way, $P(X\,|\,\Theta)$ is a way of measuring to what extent your model knows what kinds of $X$ are likely to occur in your dataset.

Paul
  • 9,773
  • 1
  • 25
  • 51
4

Start with general definitions of likelihood. With likelihood you are not really interested in probabilities, but in likelihood of $\theta$ given your data. It is calculated using probability of data using some model with parameters $\theta$, i.e.

$$L(\theta|X) = \prod_i f_\theta(x_i)$$

Now, in your examples two different likelihoods are described. In the first case, you have a regression model of $Y$ conditional on $X$ and in the second case, you have a joint likelihood of $X$ and $Y$ in bivariate model. This is the same as you can have conditional probabilities and joint probabilities, they are both probabilities, so have the same properties, but describe different cases.

Tim
  • 108,699
  • 20
  • 212
  • 390
0

It just seems that in the second case, $X$ and $Y$ are both modelled jointly in a generative model and you can write the joint likelihood as $P(X, Y | \theta)$

For example, now if you assume X and Y are independent, the joint log likelihood can be written as:

$$ \log L(\theta) = \log P(X |\theta) + \log P(Y|\theta) $$

Luca
  • 4,410
  • 3
  • 30
  • 52