I have several datasets in R+, each containing two training and test sets. For example the following dataset. I want to train a classifier by using training data such that by applying the test data, I get some reasonable number of points as anomaly so that I can analysis the related situations. The higher the value, the more abnormal it is.
By reasonable number I mean it to be less than P% in each 100 points (each day 100 points are generated, most of them should be considered normal and I want to analysis about P of the most abnormal ones).
I tried K-means with K=2. But as you see in the above link, anomaly cluster is selected by the outliers to be too high. So there would be no anomaly in test data.