I want to calculate Kullback–Leibler divergence on two documents. It requires probability distribution of each document.
I do not know how to calculate probability for each document. Any simple answer with layman example would be much appreciated.
Let's say we have follow two documents:
1 - cross validated answers are good
2 - simply validated answers are nice
(wording of the documents is just bla bla to give you an example)
How do we calculate probabilities for these documents?
Let's say we add one more document:
3 - simply cross is not good answer
If we add another document then how would it impact probability distribution?
Thanks