2

I'm thinking through the logic of Naive Bayes and encountered a brain teaser. I know that adding smoothing (alpha) to Naive Bayes can help to increase the accuracy of the model, which implies that it must change the predictions the model makes. However, I'm having trouble coming up with a toy example where it would change the prediction. Can anyone help me come up with one?

More specificall (and with code) I'm looking for a set of (x,y,x_1) where....

from sklearn.naive_bayes import CategoricalNB
m = CategoricalNB(alpha=0)
m.fit(x,y)
m.predict(x_1)

and

from sklearn.naive_bayes import CategoricalNB
m = CategoricalNB(alpha=1)
m.fit(x,y)
m.predict(x_1)

produce different predictions

mythander889
  • 123
  • 3

2 Answers2

2

The point of using Laplace smoothing is not increasing the accuracy. If you set $\alpha$ to a huge value, it can even make the accuracy worse. With $\alpha=1$, Laplace smoothing adds one to each of the counts, hence it prevents from problems with having zeros in the calculations. This smooths the probabilities and has a regularizing effect. It is a form of using a prior in Bayesian computation and the impact of a prior diminishes as sample size grows. This is pretty easy to show, if when calculating the empirical probability $\hat p_i = n_i / N$ the denominator $N$ is already large, then if you increment $n_i$ by a small value of $\alpha$, it would not have much impact on the result.

If you want to see a striking effect of Laplace smoothing, use large $\alpha$, small sample, or make it have counts equal to zeroes.

Tim
  • 108,699
  • 20
  • 212
  • 390
2

Here is an example

import numpy as np
np.random.seed(100)

x = np.array(np.random.choice([1,2,3,4,5,6], 3000, replace=True)).reshape(100, 30)
y = np.array(np.random.choice([0,1], 100, replace=True))
x_1 = np.array(np.random.choice([1,2], 300, replace=True)).reshape(10, 30)

from sklearn.naive_bayes import CategoricalNB
m = CategoricalNB(alpha=0)
m.fit(x,y)
m.predict(x_1)

# array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0])

m = CategoricalNB(alpha=1)
m.fit(x,y)
m.predict(x_1)

# array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

Edit: here's the smallest example I could find:

x = np.array([
 [0, 1, 0],
 [1, 0, 1],
 [1, 0, 1],
 [1, 0, 1]])
y = np.array([0, 0, 1, 1])

x1 = np.array([
 [1, 1, 1]])

What happens here is that the 1 in the middle variable gives a zero probability in the $\alpha = 0$ case, but in the $\alpha = 1$ case this probability is nonzero (I believe it's replaced by $1/4$) and therefore the evidence ends up being in favour of class 1.

[Note for those who come after: CategoricalNB doesn't actually count the number of classes in each feature. It assumes that the classes are labelled $0, 1, \ldots, n-1$. This confused me a lot when I was trying to understand its inner workings!]

Flounderer
  • 9,575
  • 1
  • 32
  • 43
  • Thanks! Any intuition on what's going on in the cases where it does change prediction? My intuition is that, as it's working through the conditional probabilities, it's very likely to be a certain class, until it hits a case with zero training examples and the entire probability goes to zero. – mythander889 May 21 '21 at 13:43
  • @mythander889 Yes, I believe that's exactly right! Please see my edit. – Flounderer May 24 '21 at 06:32