What is the equivalent of perplexity for a continuous distribution?

Question

Lets say I have, though some unspecified means, created a model that gives me a continuous distribution with a p.d.f $f_X$, for some variable $X$ I am modelling. And let us say that I have a set of test observations of $X$, $\tau=\{x_1, x_2,..., x_n\}$

I wish to evaluate how good this model is by checking how likely it considers this test set $\tau$. If it says that $\tau$ is a very unlikely set of samples from the distribution, then it is a bad model. If it says that $\tau$ is a likely set of samples then it is a good model. This seems intuitively like a good measure of the models quality.

If it were a discrete distribution then I would be using Perplexity. I would report the average perplexity of the testset. But I can't do that for a continuous distribution; what should I use instead?

One option I was considering was to discretize it, and report a perplexity at a particular number of bins.

At a small number of bins it is very easy to get a good perplexity, as the number of bins increases the result would become worse. By varying the number of bins I could create a curve of bins vs perplexity result. Which seems relate-able to a precision-recall curve.

I feel there is likely a more standard way to do this though.

GeoMatt22 · Answer 1 · 2017-05-04T04:35:36.910

1

Perplexity is essentially a geometric average of inverse probabilities. So for your case, a natural interpretation would be to compute the sample average over the discrete points, i.e. $$\exp\left[-\tfrac{1}{N}\sum_{i=1}^N\log\left[f\left[x_i\right]\right]\right]$$ So log perplexity would be the average of negative log likelihood over the data points.

(How useful this is may be a matter of debate.)

edited May 04 '17 at 04:35

answered May 04 '17 at 04:29

GeoMatt22

11,997
2
34
64

What is $p$? How does $p$ relate to the pdf $f_x$ ? – Lyndon White May 04 '17 at 04:34
Sorry, I was using $p$ for the PDF. Changed it to $f$ now :) – GeoMatt22 May 04 '17 at 04:36
So this would be a density result? It doesn't seem like it would have an intuitive relationship to anything observable like the discrete case does. – Lyndon White May 04 '17 at 04:38
Mathematically, I guess this is equivalent to descretizing the pdf, and taking the limit as the number of bins approaches infinity. – Lyndon White May 04 '17 at 04:45

What is the equivalent of perplexity for a continuous distribution?

1 Answers1