Understanding Latent Dirichlet Allocation Inference

Question

I'm reading the wikipedia page about how Latent Dirichlet Allocation assigns a topic distribution to a document after the model's been learnt (see this link). I'm very confused by this part of it:

Let $n_{j,r}^i$ be the number of word tokens in the $j^{th}$ document with the same word symbol (the $r^{th}$ word in the vocabulary) assigned to the $i^{th}$ topic. So, $n_{j,r}^i$ is three dimensional. If any of the three dimensions is not limited to a specific value, we use a parenthesized point $(\cdot)$ to denote. For example, $n_{j,(\cdot)}^i$ denotes the number of word tokens in the $j^{th}$ document assigned to the $i^{th}$ topic.

Could anyone explain this in simpler terms?

Thank you!

Abbreviation LDA has several different meanings. Please avoid using it without explaining what you mean first (in particular in the title). — amoeba, Feb 28 '14 at 00:34

David Marx · Accepted Answer · 2018-07-20T19:09:40.867

Consider a term document-matrix $A$: on one axis we have terms, on another axis we have documents. If we have $j$ documents and $r$ terms, the dimensions of the matrix are $j \times r$. Put another way, $A \in \mathbb{R}^{j \times r} .$ Now, if we take topics into consideration, we have a third dimension, so instead of just having a term-document matrix, we have a term-document-topic cube such that if we have $i$ topics, $A \in \mathbb{R}^{j \times r \times i}$. $n_{j,r}^i$ is just a cell in this cube. It's sort of a funny notation, but it basically suggests that we are considering the topic class separately from the documents and words, so it's almost like we are selecting the $i^{th}$ term-document matrix (the term document matrix corresponding to a particular topic) and then selecting a term-document element from this matrix. We could just as easily denote it $a_{j,r,i}$, an element of $A$.

The dot notation allows us to fix certain dimensions but not others. So in the example they give, we have $n_{j,(\cdot)}^i$. In this notation/example, $j$ and $i$ are specific values, so we are selecting a particular topic/document matrix. Within this matrix, we are not limiting ourselves to a particular term (this is what the dot is doing), so we are selecting out the full document vector for document $j$ given topic $i$. Here's another example to consider: $n_{(\cdot),(\cdot)}^i$. In this example, we aren't selecting a specific word or document, just a specific topic. So this gives us the full term-document matrix corresponding to the $i^{th}$ topic.

Math articles on wikipedia often have sort of wonky and even inconsistent notation. You might have an easier time reading the original LDA article:

Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). "Latent Dirichlet allocation". In Lafferty, John. Journal of Machine Learning Research 3 (4–5): pp. 993–1022. doi:10.1162/jmlr.2003.3.4-5.993.

Another good read is Griffiths and Steyvers "Finding Scientific Topics" (http://psiexp.ss.uci.edu/research/papers/sciencetopics.pdf). This paper introduced the Gibbs sampler for the LDA model. — jlund3, Apr 22 '14 at 18:11

Understanding Latent Dirichlet Allocation Inference

1 Answers1