I'm trying to get a measure for a set of data that indicates someone's "spam score". Essentially, the higher the spam score, the more likely they are to be spammers.
Right now, I'm measuring a person's spam score as a ratio: bad posts/total posts. If this ratio is high, then they're likely to be spammers.
The difficulty I'm having is trying to compare different people - for example, a person with 6/8 bad posts is not AS likely to be as spammy as someone who has 600/800 bad posts (the person who's made 600 bad posts is clearly a spammer, but the one who's made 6 has not proven themselves to be a spammer to that extent).
However, right now they are being assigned the same spam score of 6/8 = 0.75. Is there a way I can account for the size of the sample in my spam score?