Outlier detection using clustering on few rows

Question

I have a frequency table (2 columns) of 20 rows of various transaction amounts. Some of these amounts are fraudulent in nature and are pretty obvious as they appear to be outliers in the scatter plot. I also want to break the data into clusters.

Is there a limit on the minimum data set required for clustering?
Can I use any specific technique?
What techniques can I use to identify the outliers?

Could you post the data, or scatterplot? The best kind of clustering depends on the data. See: http://stats.stackexchange.com/a/133694/82893 — John Madden, Aug 20 '15 at 15:29
The relationship here looks well behaved to me. It looks like a reciprocal relationship. Is there any reason something like that *couldn't* be the true relationship? Other than 'outlier-looking' nature of the data in the plot, is there any reason to think these really are outliers (eg, are these impossible values)? — gung - Reinstate Monica, Aug 30 '15 at 17:00
Its more from a business point of view.Somebody should not be spending more than 20 dollars — user40465, Aug 31 '15 at 10:37

score 0 · Answer 1 · answered Aug 20 '15 at 18:34

0

Why don't you use an outlier detection first, then do clustering second?

Also, there are clustering methods (not k-means) that have a notion of noise.

Experiment. Every data set is different. We don't have your data.

answered Aug 20 '15 at 18:34

Has QUIT--Anony-Mousse

39,639
7
61
96

Outlier detection using clustering on few rows

1 Answers1