I have read on papers that Latent Dirichlet Allocation (LDA) works by identifying word cooccurances in documents. What is confusing me is since LDA uses bag-of-words approach for document representation word cooccurance information is lost. So what does it mean by LDA works by identifying word cooccurances to form topics?
Asked
Active
Viewed 966 times
2

Franck Dernoncourt
- 42,093
- 30
- 155
- 271

samsamara
- 562
- 2
- 7
- 18
-
I think you may find some answers here (not sure if it is enough for this to count as a duplicate) http://stats.stackexchange.com/questions/32310/topic-models-and-word-co-occurrence-methods/32350#32350 – Momo Jan 19 '15 at 11:29
-
It uses co-occurence between documents. So given words A,B,C if one document is AB and another AC, it might therefore link B and C because they both occur with A. That it is a bag of words approach just means that document AB = BA. – sjw Jun 07 '17 at 23:23
1 Answers
1
Bag-of-words means that order of words is not important. Co-occurrence does not take into account the order - so there is no contradiction.

Piotr Migdal
- 5,586
- 2
- 26
- 70