I have a dataset consisting of 4 classes. I have implemented the Gaussian Naive Classifier (in Matlab). In the training phase I calculate the mean and variance for each feature and each class as well as the priors (class probabilities).
In the classifying phase I do the classification according to
$$ \text{argmax}_{C_i} \log P(C_i) + \log P(D|C_i), $$
For $P(D|C_i)$ I'm usinga normal distribution with the mean and variance calculated in the training phase.
Now I want to get a feature ranking, i.e. I want to visualize the 10 most important features.
In this post they propose to use the Kullback-Leibler Divergence in the following way:
$$D_{KL}(P(\textrm{feature}_i | \textrm{class = red}) || P(\textrm{feature}_i | \textrm{class = green}))$$
I really don't know how to calculate $P(\textrm{feature}_i | \textrm{class = red})$ and $P(\textrm{feature}_i | \textrm{class = green})$ because my feature values are continous.
How should I calculate this probabilities?