1

I want to calculate Kullback–Leibler divergence on two documents. It requires probability distribution of each document.

I do not know how to calculate probability for each document. Any simple answer with layman example would be much appreciated.

Let's say we have follow two documents:

1 - cross validated answers are good 
2 - simply validated answers are nice

(wording of the documents is just bla bla to give you an example)

How do we calculate probabilities for these documents?

Let's say we add one more document:

3 - simply cross is not good answer

If we add another document then how would it impact probability distribution?

Thanks

Steffen Moritz
  • 1,564
  • 2
  • 15
  • 22
user3900
  • 85
  • 2
  • 4
  • maybe related: http://stats.stackexchange.com/questions/2476/measuring-document-similarity – mlwida Jul 06 '11 at 11:33
  • No, it is not duplicate. They never explained with any layman example and I don't want to use any LDA or any other approach. I want to know how to calculate without any approach. – user3900 Jul 06 '11 at 13:31
  • I did not say it is a duplicate :). _"I want to know how to calculate without any approach."_ What you do mean ? As far as I understand your question, a layman explanation of LDA would be fine. Or not ? – mlwida Jul 06 '11 at 13:36
  • Thank you for your reply :). LDA is full of math. I just want to know if you documents (same like above). How can you calculate probabilities on a paper :-). – user3900 Jul 06 '11 at 15:10
  • What is the probability distribution as it relates to the document? This is something you have to define. What are you trying to do exactly? – Emre Jul 07 '11 at 17:05

0 Answers0