What if a numerator term is zero in Naive Bayes?

Question

I'm trying to predict the probability that a user will visit a particular website based on several factors (day of the week, time since last visit, etc). My question is what to do if one of the numerator terms goes to zero?

For instance, suppose I visit www.google.com often, but I've never visited on a Monday. $p(Monday|google)$ is zero. Do I simply remove this term from the equation altogether?

score 9 · Accepted Answer · answered Sep 02 '11 at 01:16

9

One method to deal with this is to increment all counts by 1. This is known as Laplace smoothing. If you Google Laplace smoothing and Naive Bayes you will find many references.

answered Sep 02 '11 at 01:16

Glen

6,320
4
37
59

1

This is also called Dirichlet prior in some circles; different idea, same method. – Dec 15 '11 at 20:59
Jeffreys would probably say to start with 0.5 rather than 1, although if it makes much of a difference, you don't have enough data. – Neil G Mar 26 '13 at 00:23

score -2 · Answer 2 · answered Dec 15 '11 at 18:23

-2

I start all counts with 1, in pseudo-code: Count=max(1,Count).

answered Dec 15 '11 at 18:23

Jon Arts

5
1

1

Why does this work? What merits does it have? Doesn't it introduce a bias by pretending every zero count is really one, without changing any of the other data? – whuber Dec 15 '11 at 18:30
It is equivalent to a Bayesian prior on the estimate of conditional probability, so that in the absence of any data, the estimate is 1/2. This suggests "I don't know", which is reasonable if you don't have any data. If you do have some data then it moderates the estimate towards 1/2 and prevents estimates of 0 and 1, which are probably unrasonable in any practical application where there will always be uncertainty. Another approach is to opt for a properly Bayesian Naive Bayes, where the conditional probabilities are marginalised completely. – Dikran Marsupial Dec 15 '11 at 20:37
In short it does introduce a bias, but it reduces the variance, so the justfication is much like the justification for ridge regression. – Dikran Marsupial Dec 15 '11 at 20:38
I'm not sure you're reading this reply correctly, @Dikran. The proposed formula turns $0$ into $1$ but otherwise leaves all other counts unchanged. That is not equivalent to any Bayes prior as far as I can tell. – whuber Dec 16 '11 at 15:01
1

I now see your point, the first part of Jon's suggestion is correct, and is equivalent to Laplace smoothing; but the suggested implementation (while it would probably work) is merely an "heuristic", rather than a proper solution, and isn't equivalent to "start[ing] all counts with 1" – Dikran Marsupial Dec 16 '11 at 15:47
"I start all counts with 1, in pseudo-code: Count=max(1,Count)." This isn't quite what you want to do. As you add to the count variable, it will go: 1,1,2,3,4,5,6,7 You want to just initialize each count to 1. – Mar 25 '13 at 23:31

What if a numerator term is zero in Naive Bayes?

2 Answers2