I have a set of user data and I want to build some kind of metric to evaluate the probability of the user being a sybil (a "fake" account).
But I have a very limited set of users who are sybils with 100% certainty.
How do I use machine learning here?
Also, as for now, I've built a heuristic metric based on that data and need to evaluate it somehow.
To sum up: I have a small fraction of data that is labeled and only negative class. And need to build a metric to evaluate users. On top of that I need evaluate the "goodness" of that metric?
How do I approach this problem?
ps It would be good if I could scale this process for big datasets.