I'm currently implementing a Gaussian Naive Bayes classifier. Of course if I'm doing classification by
$$ \text{argmax}_{C_i} P(C_i)P(D|C_i), $$
then the probabilities can get very small. So I want to use log probabilities. I'm seeing three posibilities:
$$ \text{argmax}_{C_i} P(C_i)\log P(D|C_i), $$
$$ \text{argmax}_{C_i} \log P(C_i) \log P(D|C_i), $$
$$ \text{argmax}_{C_i} \log P(C_i) + \log P(D|C_i), $$
Which of them are the correct way to go? From a calculation point of view the second one is the right because for the others I'm getting negative values but from a math point of view the third one is the right due to the following:
$$ P(C_i|D) = \frac{P(C_i)P(D|C_i)}{P(D)} = P(C_i)P(D|C_i) $$
$$ \log P(C_i|D) = log[P(C_i)P(D|C_i)] = \log P(C_i) + \log P(D|C_i) $$
P(D) can be dropped because it does not depend on the class. Anyway for all variants I'm getting values outside [0,1] but I think this is ok because I'm calculating probability densitiy (from Gaussian distribution) and not probability.
I have a second question. I'm also interested in getting the importance for each feature for each pair of classes. How could this be calculated based on Gaussian Naive Bayes? I need this because I want to visualize the 10 most important features for each pair of classes.