How should perplexity of LDA behave as value of the latent variable k increases?

Question

When increasing the value of the latent variable k for LDA (latent Dirichlet allocation), how should perplexity behave:

On the training set?
On the testing set?

score 1 · Answer 1 · answered Oct 10 '17 at 17:37

1

The original paper on LDA gives some insights into this:

In particular, we computed the perplexity of a held-out test set to evaluate the models. The perplexity, used by convention in language modeling, is monotonically decreasing in the likelihood of the test data, and is algebraicly equivalent to the inverse of the geometric mean per-word likelihood. A lower perplexity score indicates better generalization performance.

This should be the behavior on test data. Here is a result from paper:

answered Oct 10 '17 at 17:37

kedarps

2,902
2
19
30

Thank you, I thought it was supposed to decrease, but wanted to confirm because scikit-learn was showing an increase on my dataset. Could you add citation details (date, journal, etc.) to the post? I assume the paper is "Latent Dirichlet Allocation" by Blei, Ng, & Jordan, but I don't see the image you are referring to in the version of the paper I am viewing. – user179041 Oct 14 '17 at 16:32
Any idea about how it should behave on the training set? I assume perplexity should also decrease on training as k increases (while, for the same k, perplexity of training should be less than in testing) but have not been able to confirm this – user179041 Oct 14 '17 at 16:40
@user179041 [Here](https://endymecy.gitbooks.io/spark-ml-source-analysis/content/%E8%81%9A%E7%B1%BB/LDA/docs/Latent%20Dirichlet%20Allocation.pdf) is the link to the paper: Since perplexity decreases with increasing likelihood of the data, I would imagine with increasing K, the perplexity to decrease. – kedarps Oct 17 '17 at 15:04
Thank you for the link. From looking at the paper, the image you included above (fig 9) is for performance on the document modeling task. But, there is another image (fig 11) for the collaborative filtering task performance which shows a local minimum then slight increase. However, the quote you included suggests that perplexity should be monotonically decreasing on test data. How do you think we should interpret this seemingly contradictory information? – user179041 Oct 19 '17 at 16:08
The perplexity metric does perplex me ;). See [this](https://stats.stackexchange.com/questions/273355/why-does-lower-perplexity-indicate-better-generalization-performance?rq=1) related question I asked earlier. I think one should not expect a decreasing perplexity with increasing K. The choice of K should be based on the value that minimizes perplexity. – kedarps Oct 19 '17 at 19:31
haha hmm... will require some more thought. But, in either case (monotonically decreasing or concave upwards), it is good to know that it definitely should not be increasing the entire time, which is what scikit-learn was returning – user179041 Oct 20 '17 at 15:36

How should perplexity of LDA behave as value of the latent variable k increases?

1 Answers1