1

Lets say I have, though some unspecified means, created a model that gives me a continuous distribution with a p.d.f $f_X$, for some variable $X$ I am modelling. And let us say that I have a set of test observations of $X$, $\tau=\{x_1, x_2,..., x_n\}$

I wish to evaluate how good this model is by checking how likely it considers this test set $\tau$. If it says that $\tau$ is a very unlikely set of samples from the distribution, then it is a bad model. If it says that $\tau$ is a likely set of samples then it is a good model. This seems intuitively like a good measure of the models quality.

If it were a discrete distribution then I would be using Perplexity. I would report the average perplexity of the testset. But I can't do that for a continuous distribution; what should I use instead?

One option I was considering was to discretize it, and report a perplexity at a particular number of bins.

At a small number of bins it is very easy to get a good perplexity, as the number of bins increases the result would become worse. By varying the number of bins I could create a curve of bins vs perplexity result. Which seems relate-able to a precision-recall curve.

I feel there is likely a more standard way to do this though.

Lyndon White
  • 2,744
  • 1
  • 19
  • 35

1 Answers1

1

Perplexity is essentially a geometric average of inverse probabilities. So for your case, a natural interpretation would be to compute the sample average over the discrete points, i.e. $$\exp\left[-\tfrac{1}{N}\sum_{i=1}^N\log\left[f\left[x_i\right]\right]\right]$$ So log perplexity would be the average of negative log likelihood over the data points.

(How useful this is may be a matter of debate.)

GeoMatt22
  • 11,997
  • 2
  • 34
  • 64