0

I have several datasets in R+, each containing two training and test sets. For example the following dataset. I want to train a classifier by using training data such that by applying the test data, I get some reasonable number of points as anomaly so that I can analysis the related situations. The higher the value, the more abnormal it is.

By reasonable number I mean it to be less than P% in each 100 points (each day 100 points are generated, most of them should be considered normal and I want to analysis about P of the most abnormal ones).

I tried K-means with K=2. But as you see in the above link, anomaly cluster is selected by the outliers to be too high. So there would be no anomaly in test data.

Yasser
  • 101

1 Answers1

0

The data sets are 1 dimensional, right?

Then you should just be doing kernel density estimation. Don't bother to look at "clustering" algorithms, these are usually designed for multivariate data; they will also be much slower, because multidimensional data cannot be sorted, whereas one dimensional can.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96