4

I have a similar problem as this one. My training samples contain N observations and K>2 classes. I want to classify my test samples into one of the K classes, or as an outlier if it is far from any known class.

Is there any R(preferred) or Python package to solve this?

One method I can think of is to use some sort of Gaussian fitting. I fit my in-sample data to get K Gaussian distributions, then for new samples I check its probability w.r.t each of the K distributions. If the largest one is above some threshold, I classify it to the class with largest prob, else it's an outlier.

Is there a package to do this? Or preferreably a more sophisticated approach. This approach suffers from curse of dimensionality I think. Maybe random forest? SVM?

The gausspr in R package kernlab seems to provide this. But after fitting, predict(..., type = 'probabilities') only gives normalised probability (prob for K classes to sum to 1). Can I get the raw score before normalisation?

jf328
  • 739
  • 4
  • 12
  • You consider doing this in two stages? Maybe someone will come around with an answer, but the only algorithms/models I know that do this naturally are unsupervised (HDBSCAN). If you can come up with a sound probability model, I suspect there is some Bayesian solution available. – shadowtalker Jul 17 '16 at 12:51
  • As for the R question, you can you view a function source code by typing its name (without parentheses) into the console. Unless the normalization happens inside a C extension you will be able to see where the normalization happens and then write your own version of that function – shadowtalker Jul 17 '16 at 12:53
  • Thanks @ssdecontrol. `?predict.gausspr` works to get the doc, but `predict.gausspr` gets me `object predict.gausspr not found` – jf328 Jul 17 '16 at 16:14
  • it's probably not "exported" from the package. Try `kernlab:::predict.gausspr` (with _three_ `:`s) – shadowtalker Jul 17 '16 at 16:16
  • still object not found :( – jf328 Jul 17 '16 at 16:26
  • I just checked, it's an S4 class. `getMethod('predict', 'gausspr')` – shadowtalker Jul 17 '16 at 16:53

0 Answers0