2

I have read on papers that Latent Dirichlet Allocation (LDA) works by identifying word cooccurances in documents. What is confusing me is since LDA uses bag-of-words approach for document representation word cooccurance information is lost. So what does it mean by LDA works by identifying word cooccurances to form topics?

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
samsamara
  • 562
  • 2
  • 7
  • 18
  • I think you may find some answers here (not sure if it is enough for this to count as a duplicate) http://stats.stackexchange.com/questions/32310/topic-models-and-word-co-occurrence-methods/32350#32350 – Momo Jan 19 '15 at 11:29
  • It uses co-occurence between documents. So given words A,B,C if one document is AB and another AC, it might therefore link B and C because they both occur with A. That it is a bag of words approach just means that document AB = BA. – sjw Jun 07 '17 at 23:23

1 Answers1

1

Bag-of-words means that order of words is not important. Co-occurrence does not take into account the order - so there is no contradiction.

Piotr Migdal
  • 5,586
  • 2
  • 26
  • 70