I am doing the CS229:Machine Learning of Stanford Engineering Everywhere. All trhough the first chapter he uses
$$L(\theta) = P(Y | X; \theta)$$
i.e. the likelihood of the parameter $\theta$ is given by the cond. prob. of Y given X
Now in the second chapter, when talking about Gaussian Discriminant Analysis, shuddenly without any explaination our likelihood looks like this:
$$L(\theta) = P(Y \cap X; \theta)$$
What happened here? Which likelihood function is used when? I find the first likelihood a much more natural choice.
I am talking about page 10 of this script