Difference between a generative MRF and discriminative CRF

Question

I am having trouble developing the intuition behind the difference between a regular generative Markov random field (MRF) and its discriminative counterpart.

So, as I think I have understood so far is as follows:

MRF aims to model the full joint distribution. So given the observations $X$ and predictions $Y$, we aim to model:

$$ P(X, Y) = P(X|Y) P(Y) $$

My first confusion is that most online references talk about difficulty modelling the complex interactions between the input $X$. Is this related to the term $P(X|Y)$ i.e. the data association term? Taking a concrete example, let us say I observe an image and every pixel in the image is an MRF site. I am interested in labelling every pixel as foreground or background. Now, is the problem with generative model related to issues with modelling $P(X|Y)$ in this case? So, in this example would this involve modelling the correlations between the different pixels of observed image? I am at a loss as to why modelling this joint distribution is so difficult.

Now, moving on to CRF, they aim to model the conditional probability distribution directly i.e. $P(Y|X=x)$. Again, I have no intuition as to why this should be an easier problem that the $P(X, Y)$. I can come up with some explanations like we can use $X=x$ somehow to our advantage and simplify the modelling process but have not been able to convince myself.

If someone can give an intuitive explanation and an example, I would be really grateful.

FYI [What's the difference between a Markov Random Field and a Conditional Random Field?](http://stats.stackexchange.com/q/156697/12359) — Franck Dernoncourt, Jan 02 '17 at 18:02

Difference between a generative MRF and discriminative CRF

0 Answers0