Training a (binary) classifier

Question

Let's say I want to train a classifier using supervised learning. I asked a group of human evaluators to decide whether my training samples are positive or negative.

During training, should I i) use samples where a consensus is reached (e.g., a sample receiving 5 positive votes out of 6 votes), or use the votes directly (e.g., if a sample receives 3 positive votes out of 6, we then have effectively 6 samples, 3 positive and 3 negative)?

I consider an election is won if it is decided by a super-majority (e.g., one class winning 75% or more of the votes). A training sample is dropped if no super-majority is reached.

In other words, should I use the outcome of each election or use the individual votes?

In addition, I think I get better performance (in terms of precision and recall using held-back test samples) if I use the election outcomes instead of the individual votes during model training. To summarize,

During training, I use either the election outcome or individual votes to train the classifier. The former outperforms the latter but the there are about half as many training samples using the former method (because many elections did not reach a super-majority).
During testing, I use the election outcome only as ground truth.

am wondering whether you are getting 'better' performance because you are using a fixed threshold of 0.5 rather than choosing the threshold based on your actual requirements. — seanv507, Nov 02 '16 at 19:55
@seanv507: I clarified the confusion by editing the question. — wsw, Nov 02 '16 at 19:56
I think you should be using the model to predict the fraction of the positive votes, and treat the data from your human evaluators as a Binomial sample from that. — sega_sai, Nov 02 '16 at 20:09
[I agree with @firebug and iliyan bobev that you should be using the individual votes], and the question is to understand exactly what differences in the two processes explain the better performance. eg are you comparing accuracy on individual vote to accuracy on election? I think you need to give more details on exactly what you are doing in the two cases. — seanv507, Nov 02 '16 at 20:10
@seanv507: Thanks for your comments! I added more details to the question. — wsw, Nov 02 '16 at 20:19

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

You can use the fraction of votes as input to a logistic regression model, just like described in this answer. This is implemented (in R) in glmnet for example, where the response variable is a two-column matrix: the first being counts or proportions and the second the target class.

edited Apr 13 '17 at 12:44

Community

1

answered Nov 02 '16 at 19:25

Firebug

15,262
5
60
127

I see. Let's say for a sample I get 3 positive votes and 3 negative votes. In principle I can replicate my sample (my row of data including the features) 6 times so that I have 3 positive and 3 negative training samples, right? Let's assume that we use logistic regression only for now. – wsw Nov 02 '16 at 19:35

score 0 · Answer 2 · edited Jun 11 '20 at 14:32

0

You need to define the training features and train against them for given result. Imagine people are voting on dish. So let's ask them, what do they consider for the vote and we get the features: price, size, vegan. A data set will contain, the measurement on each feature of the dish:

$15 ; 300g ; No ; 1
$20 ; 450g ; Yes ; 0

...
$9.95 ; 200g ; Yes ; 1

Columns 1 through 3 are the features, and the 4th is the final vote. Training on that data will allow you predict the vote on a dish, given its features.

edited Jun 11 '20 at 14:32

Community

1

answered Nov 02 '16 at 19:17

Iliyan Bobev

179
1
8

I have features. My question was about whether we should use the election outcomes for training or use the individual votes for training. – wsw Nov 02 '16 at 19:18
You need to add each vote as a separate data point. – Iliyan Bobev Nov 02 '16 at 19:19
Why can't I use the election results only if a consensus is reached (i.e., the election is won by a majority)? I think I get better performance (in terms of precision and recall) doing things my way. – wsw Nov 02 '16 at 19:26
Because you will lose precision. With majority, the cases with votes 51:49 will be considered equal to 100:0. While cases with 100:0 votes are very unlikely to have a different outcome with another set of voters, the 51:49 could easily change. Keeping individual votes as data will reflect that in the trained model as probability of the prediction. – Iliyan Bobev Nov 02 '16 at 19:41
Sorry I forgot to mention that I consider an election is won if it is decided by a super-majority (e.g., one class winning 75% or more of the votes). – wsw Nov 02 '16 at 19:42

Training a (binary) classifier

2 Answers2