How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?

Question

I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really understand how does this model assign topic to an unseen document?

I used Gensim. After training, I have an LDA trained model and a dictionary with most frequent words. Let's say, I have an unseen new document with the following text:

This is just a test text about topic modeling and LDA.

Can someone explain step by step how a topic distribution is assigned to this new document in terms of algorithmic steps? The same goes for NMF method.

By the context, I understand that LDA refers to Latent Dirichlet Allocation, but please clarify this in the question. Also include the full name for Non-negative Matrix Factorization. — Daniel López, Jan 29 '18 at 14:37
The Bayes decision rule of assigning topics to new documents depends on the loss function. — Łukasz Grad, Jan 29 '18 at 14:49
LDA does not assign topics to documents, it assigns topics to words and topic-distributions to documents. — guy, Jan 29 '18 at 15:23
@guy I should have explicitly specified that. I meant topic distribution. — nickg, Jan 29 '18 at 15:25
The topic distribution represented as a point on the $n_{topic}$-dimensional simplex, and is inferred by looking at the posterior under a Dirichlet prior. If we were to use, say, a Gibbs sampler, the topic distribution would be updated across iterations by sampling from the associated full conditional, which by conjugacy is another Dirichlet. — guy, Jan 29 '18 at 17:23

score 2 · Answer 1 · answered Feb 28 '18 at 11:34

What you should actually do is run inference (training) on the new set of documents (the old ones and the new ones together). A short-cut that estimates this well is applying Gibbs sampling only to the new documents while using the data obtained during training unchanged, as described by @SheldonCooper in Topic prediction using latent Dirichlet allocation.

How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?

1 Answers1

Linked