5

I have a Logistic Regression supervised classifier that is trained on n=365 observations and m=179 attributes with c=5 classes. It's performing at 94.9% accuracy using a very robust cross-validation procedure. The model naively assumes that each of the new observations will fall into one of those categories but there may be an instance where there is a completely unknown class. Is there a way to model a None of the above category? I was going to look at the probabilities but the probabilities are very high for predictions even if the model predicted it incorrectly.

O.rka
  • 1,250
  • 4
  • 19
  • 30
  • How can it "*be an instance where there is a completely unknown class*" if the representation of the world the model trains upon must always have 5 classes? You might want to consider creating artificial noise examples but that's another ball game... (In general +1 at Tim's answer and comment, they are pointing you to the correct direction.) – usεr11852 Aug 06 '17 at 13:03
  • First, follow the advice from @Tim. Second, you may need to be more skeptical of your model. With so few cases and even with only 2 classes in logistic regression, your model should usually only consider about 10-20 features as predictors unless you are penalizing as with ridge regression. You may be overfit and have a model that doesn't apply well to new data. – EdM Aug 06 '17 at 13:31

2 Answers2

8

First of all, you are talking about multinomial regression, not logistic regression. Second, neither logistic regression nor multinomial regression are classifiers. Logistic regression and multinomial regression both predict probabilities of belonging to some class. To make a classifier of them you need a decision rule (if probability is greater then some value, classify this as some class).

It is up to you to define the decision rule. You are free to decide that if the probability of belonging to any class is not greater then some value, neither of the categories can be chosen.

mkt
  • 11,770
  • 9
  • 51
  • 125
Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks for this. The implementation I'm using has a `one-vs-rest` option. Does that make it an ensemble of logistic regressions or is it still considered a multinomial regression? – O.rka Aug 03 '17 at 21:10
  • 1
    @O.rka In such case it's an ensemble. Still, my answer applies as you can use such approach with any method that returns probabilistic "classifications". – Tim Aug 03 '17 at 21:12
3

No way, you either have some of the non-class data already and set them into a 6th class and train the model with the 6th class. Or you set some thresholds for scoring the likeliness that a sample falls into each class evaluated by something like hidden markov model, but the accuracy would be worse than machine learning models. And as you don't even have non-class data now, you can't really have a fair classifier that can distinguish the true non-class data.

Lily Long
  • 133
  • 6