2

In the Naive Bayes classifier, why do we have to normalize the probabilities after calculating the probabilities of each hypothesis?

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271

1 Answers1

5

You do not have to normalize the probabilities if you only care about knowing which class ($\hat{y}$) your input ($\mathbf{x}=x_1, \dots, x_n$) most likely belongs to, since the maximum a posteriori (MAP) decision rule is as follows:

$\hat{y} = \underset{k \in \{1, \dots, K\}}{\operatorname{argmax}}p(C_k \vert x_1, \dots, x_n) = \underset{k \in \{1, \dots, K\}}{\operatorname{argmax}} \ p(C_k) \displaystyle\prod_{i=1}^n p(x_i \vert C_k)$

Since

$$\begin{align} p(C_k \vert x_1, \dots, x_n) & \varpropto p(C_k, x_1, \dots, x_n) \\ & \varpropto p(C_k) \ p(x_1 \vert C_k) \ p(x_2\vert C_k) \ p(x_3\vert C_k) \ \cdots \\ & \varpropto p(C_k) \prod_{i=1}^n p(x_i \vert C_k)\,. \end{align}$$

If you do want the class probabilities, then you indeed need to normalize:

$p(C_k \vert x_1, \dots, x_n) = ~\frac{p(C_k) \prod_{i=1}^n p(x_i \vert C_k)\,}{ \sum_{1 \leq j \leq |C|}{ \left( p(C_j) \prod_{i=1}^n p(x_i \vert C_j\,) \right) }}$

But keep in mind that, from {1}

the winning class in NB classification usually has a much larger probability than the other classes and the estimates diverge very significantly from the true probabilities. NB classifiers estimate badly, but often classify well.


References:

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271