Choosing smoothing parameters across multiple Naïve Bayes classifiers with different number of categories

Question

I would like to train multiple Naïve Bayes classifiers with different number of categories, and also have a global threshold for how certain one classifier must be in order for the classification to be trusted.

In this specific case, a classifier should be trained for each user in my database, and each user have any number of categories. Now I would like to set a threshold across all classifiers for how certain the classifier should be before suggesting something to the user.

What is a good way to do this?

Messing with the smoothing parameter? If I apply the standard add-one smoothing to all classifiers, then a classifier that has only 2 categories will disagree a lot with one that has 100 categories (when little data is observed).

What is a good way to even this out when only little data is observed?

My solution right now is to use $1/no\_of\_observed\_classes$. And it seems to work well on a few test cases.

Is this fine or am I missing some important implications?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

This is not a direct answer to your question but may help in terms of understanding the principles that govern the 'smoothing' operation. These smoothing factors represent your prior distributions, which manifest themselves as pseudo counts when you are dealing with discrete distributions such as a binomial or multinomial distribution.

If I apply the standard add-one smoothing to all classifiers, then a classifier that has only 2 categories

Here, your variable follows a binomial distribution. The frequency counts refer to the likelihood and the smoothing factors refer to your prior distribution, which in this case you can represent as a beta distribution (really good explanation this one). The beta distribution is the conjugate prior of a random variable that follows a binomial distribution.

What is a good way to even this out when only little data is observed?

This relates to adjusting the equivalent (effective) sample size of your prior, e.g deciding how strong you will set your priors to be. This paper should give you ideas; however before looking at it I would make sure to read the link on beta distribution, and to understand the concept of a conjugate prior.

Hope this helps.

Choosing smoothing parameters across multiple Naïve Bayes classifiers with different number of categories

1 Answers1