3

Let's say I have a ton of credit card usage data and have also some means to predict if a given transaction is fraudulent. Now I want to know what kind of criminal entities are behind these frauds. Maybe some are big international/domestic crime organizations, some are small groups and some are individuals etc.

I'd assume that clustering analysis would help making the first step in this situation. What combination of techniques are useful in this scenario, and how do you apply them?

EDIT: To clarify, what I want to do is clustering "similar" fraudulent transactions so that I can make some guess on from whom/what they originated. Maybe certain types of frauds tend to come from a certain region. Some other types target certain combinations of flaw in the system (e.g. lot of small transactions vs. bigger transactions) etc. In principle, I want to extract some pattern that will help the investigators.

Enno Shioji
  • 459
  • 3
  • 10
  • 2
    I'd start with thinking that until you've got a reason to believe otherwise, statistics actually *won't* help answer the question "what kind of criminal entities are behind these frauds". Whether there is any such reason would depend on the secondary information you've got about each transaction and its user eg demographic, location, if they're associated with a group, etc. – Peter Ellis Mar 29 '12 at 03:54
  • @PeterEllis: Understood. Hope my edit clarifies this a bit. – Enno Shioji Mar 29 '12 at 05:42
  • 1
    You'd need to show some examples. We don't know your data, and you questions is pretty much a "what should I do with my data" question. – Has QUIT--Anony-Mousse Mar 29 '12 at 06:54

2 Answers2

4

To expand on Peter's comment, you want to analyze community structure in your data, which is a problem in social network analysis.

I hope you knowing the right terms will help you locate relevant papers, such as these:

If you tell us more about what personal information you can populate your social network with, we can be more specific.

Emre
  • 2,564
  • 15
  • 22
2

This reminds me of the following. For example Weber has a case on outliers. You can find it here: http://www.kellogg.northwestern.edu/faculty/weber/emp/_Session_4/Outliers.pdf

It is a very accessible case.

An outlier analysis may be appropriate. Cases which are systematically different from others may be the fraudulent cases.

I hope this helps.

C. Pieters
  • 633
  • 1
  • 5
  • 13