1

I'm mainly confused because of some of the wording in this CV post. Multinomial Naive Bayes.

Mainly, this line:

In summary, Naive Bayes classifier is a general term which refers to conditional independence of each of the features in the model, while Multinomial Naive Bayes classifier is a specific instance of a Naive Bayes classifier which uses a multinomial distribution for each of the features." by jlund3.

Why is each feature $x_i$ a multinomial distribution, and not the product $\prod{P(x_i|c)}$ a multinomial distribution?

$$P(c|X) \propto \prod{P(x_i|c)} * P(c)$$

If the product of probabilities is distributed as a multinomial, that makes sense to me since

$$\prod{P(x_i|c)} \propto \prod{{p_i}^{x_i}} $$

I don't really understand how each feature itself could be distributed as a multinomial, however. Wouldn't each $x_i$ end up being a multinomial with two labels ($x_i$ == count of x_i. $x_{not i}$ == count of every other word)

Any help would be much appreciated!

Brian
  • 13
  • 4

1 Answers1

1

Take throwing a dice for example, the result $X$ can be a random number in $1,2,..,6$. When we do Bayesian parameter estimation, we can assume the probability of throwing a $1$ is $p_1$, throwing a $2$ is $p_2$,..., and so on. Hence, we want to estimate $(p_1,p_2,...,p_6)$. If we throw the dice $N$ times, we get $1$ for $c_1$ times, $2$ for $c_2$ times. We denote the result as a vector $\textbf{c}=(c_1,c_2,..,c_6)$, where $c_1+c_2+...+c_6=N$. Then the likelihood will be $\prod_{j=1}^6p_{j}^{c_j}$ ,which is exactly multinomial distribution. If $N=1$, only one dimension of $\textbf{c}$ will be 1, the rest will be 0. As you can see, this feature $X$ is a multinomial distribution.

Youtub videos Bayesian Naive Bayes provide detailed explanation of multinomial Naive Bayes which I found very useful.

Naomi
  • 500
  • 4
  • 13