Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work

Question

I have been playing with the hyper-parameters of the latent Dirichlet allocation (LDA) model and am wondering how sparsity of topic priors play a role in inference.

I have not performed these experiments on real data, but on simulated data. I started with a fixed vocabulary of fifteen words $(W = 15)$ and generated three dummy topics $(K = 3)$. Then using the generative process of LDA model, generated words $(N = 100)$ and documents $(D = 10)$.

For inference, I am using the collapsed Gibbs sampler by Griffiths et al., 2004. I kept the hyper-parameter $\alpha$ the same, but used a range of values for $\eta$, which is the Dirichlet hyper-parameter for the prior on the topics. Here are some results:

I know that the figures are busy, so let me explain what they are. The first column (blue) in both figures refers to the "True" topics which were used to generate the documents. In Figure 1, the topics are chosen to be sparse, whereas in Figure 2 they are not sparse. Each of the red columns following the blue column are inferred topics for following values of $\eta = [0.1, 0.3, 0.5, 0.7, 0.9, 1, 3, 5, 7, 9]$

Here are my observations from the figures:

When underlying topics are sparse, the LDA model does a pretty good job of inferring topics as long as the prior is chosen to be sparse, i.e. $\eta < 1$.
When the underlying topics are not sparse, for $\eta < 1$, the model infers words with high probability pretty well, but the words with less probability are not so well represented in the topics, even with non-sparse priors, i.e. $\eta > 1$.

I think I follow that a sparsity assumption on topics is a fair one, especially for large text corpora which have a lot of words. But if the underlying topics are not so sparse, will the LDA model not be able to infer topics correctly?

EDIT 1:

I changed the figure so that the inferred topics are now arranged based on the best matching true topics using KL-Divergence between the true topics and the inferred topics. Any edit suggestions to help improve my question are welcome.

Nice work. I think that for the bottom half, $\eta = 0.9$ or $\eta = 1.0$ gives not a bad fit. Of course you can consider topics that are more evenly distributed, but I imagine they are of little use. How would you interpret a topic that is uniformly distributed? — Łukasz Grad, Mar 07 '17 at 23:26
@ŁukaszGrad agreed that uniform topics can't be interpreted. — kedarps, Mar 08 '17 at 03:44
Also realized that usually in the LDA model, the quality of resulting topics is evaluated by the top 10 or 20 words. In this case, if we look at the top 5 words, the model does a pretty good job in both cases. I can't imagine an application where words having a less probability are important for topic evaluation. — kedarps, Mar 08 '17 at 03:46
@AlexR. The plots are basically bar plots rotated by 90 degrees, where the x-axis is word index and y-axis are the counts, so basically it's a rotated histogram. — kedarps, May 22 '17 at 22:06
It would be really nice if you could order the inferred topics so that they align with the most similar ground-truth topic. — eric_kernfeld, May 26 '17 at 21:07
@eric_kernfeld sorry for the delayed response. I updated the figures so that the inferred topics are now arranged according to the true topics. — kedarps, Jul 21 '17 at 16:05

Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work

EDIT 1:

0 Answers0