How to calculate the probability that a feature falls into a certain class

Question

There are classes A,B,C,D,E. Variable x has different means for each of these classes, but there is overlap in the range of x among classes. Item counts are different between classes, eg. there are many more A's than B's etc. Given an item with a known value of x but unknown class, how do I calculate the probability of it falling into class A vs. class B vs. class C etc.

eg. If x is close to the mean value for class C, there may be 60% probablity of this item falling into class C, 20% for class B, 10% for class D, 7% for class A and 3% for class E.

Do you know anything more about the distribution of x in each class, other than just the mean? If you can represent x's distribution in each class by a normal distribution with known mean and variance, for example, the problem becomes much simpler. — Nuclear Hoagie, Jul 31 '18 at 18:49
@Nuclear Wang. Yes, x's distribution within each class is known. However, it is not a normal distribution. This phenomena tends to be very right skewed. Becomes more normal when log transformed. — F.G., Jul 31 '18 at 18:53

score 1 · Accepted Answer · answered Jul 31 '18 at 18:47

Your problem is one of probabilistic multiclass classification. A classical statistical approach is multinomial logistic regression. There are also many machine learning approaches, like CARTs or Random Forests.

(Multinomial) logistic regression automatically outputs conditional probabilities. For tree-based methods, you may need to specifically set a parameter. For instance, if you use the randomForest package in R, you need to apply predict.randomForest(...,type=prob). Or use a dedicated implementation.

How to calculate the probability that a feature falls into a certain class

1 Answers1