How to use KL-divergence in naive bayes classifier to weight features?

Question

I have a dataset consisting of 4 classes. I have implemented the Gaussian Naive Classifier (in Matlab). In the training phase I calculate the mean and variance for each feature and each class as well as the priors (class probabilities).

In the classifying phase I do the classification according to

$$ \text{argmax}_{C_i} \log P(C_i) + \log P(D|C_i), $$

For $P(D|C_i)$ I'm usinga normal distribution with the mean and variance calculated in the training phase.

Now I want to get a feature ranking, i.e. I want to visualize the 10 most important features.

In this post they propose to use the Kullback-Leibler Divergence in the following way:

$$D_{KL}(P(\textrm{feature}_i | \textrm{class = red}) || P(\textrm{feature}_i | \textrm{class = green}))$$

I really don't know how to calculate $P(\textrm{feature}_i | \textrm{class = red})$ and $P(\textrm{feature}_i | \textrm{class = green})$ because my feature values are continous.

How should I calculate this probabilities?

Well you could apply discretization in order to transform numerical features into categorical. Is it important for you to keep the features numerical? If so you can use other feature weighting methods. Have a look on this paper: "A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms". The suggested methods can be implemented pretty easy. — NeuroMorphing, Jul 27 '15 at 17:30
How would you apply discretization? Perhaps first transform them to zero mean and unit variance or to range [0,1] but then how to discretize? I will have a look at the paper. — machinery, Jul 27 '15 at 23:12
Correct, first scale them into a [0;1] range and then use a threshold in order to decide wether if a feature falls into 0 or 1. The threshold can be for instance the median... — NeuroMorphing, Jul 28 '15 at 05:28
One could also use as threshold 0.5. How would you use the median as threshold? The problem is that I only have 37 data points... so it would be not that accurate. — machinery, Jul 28 '15 at 09:36
Well sort your normalized scores and just calculate the median. It is not guaranteed to be 0.5 ! — NeuroMorphing, Jul 28 '15 at 09:39

How to use KL-divergence in naive bayes classifier to weight features?

0 Answers0