With the Naive Bayes classifier, why do we have to normalize the probabilities after calculating the probabilities of each hypothesis?

Question

In the Naive Bayes classifier, why do we have to normalize the probabilities after calculating the probabilities of each hypothesis?

If you want to have the responsibility of each class for a single datapoint, you must normalize to 1, otherwise it can not be interpreted as a probability. — Nikolas Rieble, Dec 05 '16 at 15:42
This question appears to be squarely on-topic to me - it's about probability and statistics. — Sycorax, Dec 05 '16 at 16:48
Maybe duplicate: https://stats.stackexchange.com/q/129666/103153 — Lerner Zhang, Jan 20 '19 at 16:24

score 5 · Accepted Answer · answered Dec 26 '16 at 18:11

You do not have to normalize the probabilities if you only care about knowing which class ($\hat{y}$) your input ($\mathbf{x}=x_1, \dots, x_n$) most likely belongs to, since the maximum a posteriori (MAP) decision rule is as follows:

$\hat{y} = \underset{k \in \{1, \dots, K\}}{\operatorname{argmax}}p(C_k \vert x_1, \dots, x_n) = \underset{k \in \{1, \dots, K\}}{\operatorname{argmax}} \ p(C_k) \displaystyle\prod_{i=1}^n p(x_i \vert C_k)$

Since

$$\begin{align} p(C_k \vert x_1, \dots, x_n) & \varpropto p(C_k, x_1, \dots, x_n) \\ & \varpropto p(C_k) \ p(x_1 \vert C_k) \ p(x_2\vert C_k) \ p(x_3\vert C_k) \ \cdots \\ & \varpropto p(C_k) \prod_{i=1}^n p(x_i \vert C_k)\,. \end{align}$$

If you do want the class probabilities, then you indeed need to normalize:

$p(C_k \vert x_1, \dots, x_n) = ~\frac{p(C_k) \prod_{i=1}^n p(x_i \vert C_k)\,}{ \sum_{1 \leq j \leq |C|}{ \left( p(C_j) \prod_{i=1}^n p(x_i \vert C_j\,) \right) }}$

But keep in mind that, from {1}

the winning class in NB classification usually has a much larger probability than the other classes and the estimates diverge very significantly from the true probabilities. NB classifiers estimate badly, but often classify well.

References:

{1} Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. "Introduction to Information Retrieval." 2009, chapter 13 Text classification and Naive Bayes.

With the Naive Bayes classifier, why do we have to normalize the probabilities after calculating the probabilities of each hypothesis?

1 Answers1

Linked