Model fails individual predictions but gets the total right

Question

I have a binary classification problem (lets say, whether or not an observation will experience action x). I train a random forest model on a training set where about 50% have done action x and 50% have not. I test the model on a test set (again, about 50% did action x, 50% did not), and its about 85% or so accurate and has an overall error rate of about 15% or so. A year passes and I get new data and I want to see how the model performed. It predicted that about 9% of the data will experience action x. 9% of the data did in fact experience action x but it failed to accurately predict the individual observations that would experience action x. In order words, the individual observations it predicted would experience action x did not actually experience action x. And the individual observations that did in fact experience action x, the model did not predict it so.

So essentially what does it mean that the model gets the aggregate correct but fails to make accurate predictions at the micro level? Is there a mathematical explanation on how this might occur? Maybe something to do with aggregating the probabilities? Is it still useful at predicting totals?

Could you tell us more? I'd say that this is something that I'd totally expect. — Tim, Aug 20 '16 at 18:20
Not much to add, really. I used the randomForest package in R. I also did a k-fold cross validation as well, and the accuracy held at the same rate. — Easthaven, Aug 26 '16 at 14:47
So *maybe* you simply cannot get more out of it..? See http://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless — Tim, Aug 26 '16 at 14:48
The total estimate can still be useful. I just want to know if it makes sense that this would occur. Some consequence of aggregating or averaging the entire set of probabilities. — Easthaven, Aug 26 '16 at 17:29

Model fails individual predictions but gets the total right

0 Answers0