3

I'm working on a naive Bayes classifier that calculates probabilities using a normal Gaussian distribution. This works very well when I am classifying something into two mutually exclusive buckets (e.g. spam vs. not-spam), but when I am working with a factor that is not easily classified that way (when the classifications are not mutually exclusive) I would like to express the result as a percentage.

When I combine the probability density of several factors (by multiplying them together) I tend to get a very small number and I would like to adjust that so I can express it in a 0-100 percent range, so it will be more easily understood. Is there another factor I can use to adjust the probability density into a percentage range?

For example: in the Wikipedia article for naive Bayes classifiers, there is an example of using height, weight and shoe size variables to classify a person as male or female. After computing the numbers, the posterior result for the female classification is 5.3778E-4 (or .00053778). Out of context that seems minuscule, but if you compare that number to the result for the male classification, the percentage would be 99.99% female. What factors (if any) could I apply to the posterior result to get that percentage, without the benefit of the male result to compare it to?

Matt
  • 131
  • 3
  • Why not work with logarithms of probabilities? – charles.y.zheng Mar 18 '11 at 21:46
  • 1
    Technically, you could compute $p(data)$, which is the normalization factor. However, the easiest way to compute it is usually to compute the results for all classes and sum them up. Note that this works for any number of classes, not just two. – SheldonCooper Mar 18 '11 at 21:46
  • 1
    @Matt Those numbers are not probabilities at all: they are probability *densities.* It makes no sense to force a bunch of them to sum to 1. (In fact, you can make them individually be *greater* than 1 merely by changing the units of measure for weight or whatever.) See the related question and analysis at http://stats.stackexchange.com/q/4220/919 . – whuber Mar 19 '11 at 04:48
  • @whuber that's right, the question is about probability densities. I'm wondering if I can apply a function or constant(s) (such as the normalization factor mentioned above by SheldonCooper) to transform the probability density "5.3778E-4" from the posterior value in the example to something like "99.99%", or is that not practical? – Matt Mar 19 '11 at 07:10
  • 1
    @Matt It is *practicable* (it can be done) but its meaningfulness is doubtful. However, *ratios* of densities do make sense, as @ProbabilityIsLogic has suggested in a reply. For instance, you might re-express all the results as ratios relative to the largest one. They won't sum to unity--that is the part that is meaningless--but they will all lie between 0 and 1. – whuber Mar 19 '11 at 13:58
  • @whuber thank you for the insights. I am not expecting the numbers to sum to 1, in fact it's easier to compute a ratio or percentage when the classifications are mutually exclusive and cover the whole set (like male/female and spam/not-spam). The problem I'm dealing with is when you have classifications that can overlap and are not exhaustive (i.e. a sample could match multiple classifications, or none). – Matt Mar 19 '11 at 20:33
  • @charles.y.zheng I could use logarithms of the probability densities and work with results that weren't as small, but I don't think that would get me to a percentage, would it? As I understand it, using sums of logarithms is just to avoid rounding errors when your floating point numbers get too small. The results stay in order, so you can see which result is the best, but they aren't comparable to anything else. – Matt Mar 20 '11 at 05:12

1 Answers1

2

You can use any "monotonic" transformation of the probabilities as you choose (at least as far as I know). As long as your transformation preserves the ordering of probabilities, you will not be lead astray in your decision making. Personally, I prefer to use odds ratios. They seem to make intuitive sense to me, and I know what decision to make almost straight away (and the "region of uncertainty" is easily identified, perhaps odds between 0.1 to 10?).

my answer to this question goes through a worked example of how you would use the odds in naive Bayes classifier.

probabilityislogic
  • 22,555
  • 4
  • 76
  • 97