Did Statistics.com publish the wrong answer?

Question

Statistics.com published a problem of the week: The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?

https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true

My peer and I both came up with the same answer independently and it doesn't match the published solution.

Our solution:

(.9*.1)/((.9*.1)+(.2*.9))=1/3

Their solution:

This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.

Who was right

looks like they corrected the solution on their website to be in line with what you calculated — nope, Dec 18 '18 at 16:11
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram. — Kodiologist, Dec 18 '18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible. — Acccumulation, Dec 18 '18 at 20:49
This is known as the [false positive paradox](https://en.wikipedia.org/wiki/Base_rate_fallacy#False_positive_paradox) — BlueRaja - Danny Pflughoeft, Dec 18 '18 at 23:18

score 41 · Accepted Answer · answered Dec 18 '18 at 16:15

I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.

score 11 · Answer 2 · answered Dec 18 '18 at 16:15

11

You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.

answered Dec 18 '18 at 16:15

Dilip Sarwate

41,202
4
94
200

Did Statistics.com publish the wrong answer?

2 Answers2