6

Evaluating a discriminative model is relatively easy: compare the predictions with ground truth, using cross-validation.

Unfortunately this strategy can't be used for generative models. Surely this problem has been tackled already?

static_rtti
  • 745
  • 1
  • 11
  • 24

1 Answers1

8

Discriminative algorithms model P(Class|variables), whereas generative algorithms model P(Class,variables) = P(Class|variables)* P(variables). Hence, by modelling the joint distribution of the variable space, generative algorithms model the underlying process that 'created' your data.

My point in starting with this first paragraph is to note that generative algorithms have discriminative properties. Therefore, the same method of evaluating the predictive performance:

"compare the predictions with ground truth, using cross-validation."

applies to generative models, as well as discriminative ones.

However, as you imply, we can additionally asses the ability of the generative algorithms in modelling the underlying process that generates data. A commonly used group of metrics for this is "information theoretic scores" that derive from the idea of likelihood (log-likelihood). Below are some well-known information theoretic scores:

1- log-likelihood (LL) score

2- minimum description length (MDL) score

3- minimum message length (MML) score

4- Akaike Information Criterion (AIC) score

5- Bayesian Information Criterion (BIC) score

Note that 2, 3, 4, and 5 use some complexity penalisation factor over the LL score. This is good practice to combat over-fitting.

Zhubarb
  • 7,753
  • 2
  • 28
  • 44