1

I am trying to implement Naive Bayes, but I am encountering a problem. I have 5000 word features. Hence, every sample is a binary vector of length 5000. The true labels are 1 or 0. The value of P(feature=1 | label=1) and P(feature=0 | label=1) are very small (~0.03) as the feature vector is very sparse. When I calculate the numerator i.e.

P(features | label=1) * P(label=1)

since, the probability values are very small and because of the conditional independence assumption of Naive Bayes, when I multiply 2000 such small terms, I get 0 and hence, a wrong result. What should be done?

Hellboy
  • 181
  • 3
  • 8

1 Answers1

4

The two most commonly used techniques to prevent underflows with a naive Bayes classifier are:

  1. Working in the log space
  2. Using the log-sum-exp trick

More details: Example of how the log-sum-exp trick works in Naive Bayes


FYI:

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
  • I did the summation of log probabilities. But, the sum of log probabilities comes out ot be -100000. Still, the answer comes out to be wrong as log P = -100000 and hence P would be 0. What should I do now? – Hellboy Feb 11 '17 at 19:42
  • @Hellboy What do you want to do with P? – Franck Dernoncourt Feb 11 '17 at 19:47
  • Since, it is a binary classifier, I'll compare P with 0.5, if it is greater than 0.5, I'll assign positive label to the test sample. – Hellboy Feb 11 '17 at 19:51
  • @Hellboy How about comparing log P with log 0.5? – Franck Dernoncourt Feb 11 '17 at 19:53
  • Yes, but every sample is being classified as a negative sample as every sum is less than log 0.5 – Hellboy Feb 11 '17 at 19:59
  • @Hellboy how about using the maximum a posteriori (MAP) decision rule? ([Example of how the log-sum-exp trick works in Naive Bayes](http://stats.stackexchange.com/a/253319/12359) 1st bullet point) – Franck Dernoncourt Feb 11 '17 at 20:02