The effect of oversampling on the positive predictive value

Question

I need to calculate the positive predictive value for a validation set for a rare event. The problem is that the validation set was oversampled for the rare event. The event occurs in 5 percent of the population, however the oversampling has adjusted it to be in 50 percent of the sample.

How does the oversampling effect the calculation of the ppv?

This article explains that you need to adjust the odds (not the probabilities) by the fraction your oversample by: https://yiminwu.wordpress.com/2013/12/03/how-to-undo-oversampling-explained/. — Dan, Apr 04 '19 at 15:23

Arpit Sisodia · Answer 1 · 2019-04-04T16:20:30.557

-1

Yes, probabilities are inflated now because of oversampling. you can divide predictions by 10 as you had 10 folded the positive class. or there are different ways of re- calibrating probabilities like -

https://quinonero.net/Publications/predicting-clicks-facebook.pdf

( section 6.3 -model re calibration)

edited Apr 04 '19 at 16:20

answered May 21 '16 at 11:30

Arpit Sisodia

1,029
2
7
23

Do you have a reference for this? This would mean the highest rating the classifier could provide would be 0.1 which doesn't really make sense to me. – Dan Apr 04 '19 at 14:06
sure- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf – Arpit Sisodia Apr 04 '19 at 15:08
go to section 7- calibrating predictions. – Arpit Sisodia Apr 04 '19 at 15:08
That article suggests using isotonic regression to calibrate models. It certainly does not suggest dividing your predictions by a constant and also seems unrelated to oversampling in preprocessing. – Dan Apr 04 '19 at 15:16
For example: https://stats.stackexchange.com/a/257507/40604 – Dan Apr 04 '19 at 15:20
Hey @Dan, yes you are correct that it shouldn't be divided directly by 10. But recalibation of probabilities are required. I have done lot of research already. Read the another Facebook click prediction - https://quinonero.net/Publications/predicting-clicks-facebook.pdf ( if you go to recalculation of predictions section ) . – Arpit Sisodia Apr 04 '19 at 16:16
then as it stands, your answer is incorrect (and misleading). I suggest you edit it or delete it. – Dan Apr 04 '19 at 16:19
edited , but yes, probabilities are inflated and must be re-calibrated. – Arpit Sisodia Apr 04 '19 at 16:21
No one is arguing otherwise. But your answer contains an entirely incorrect method for the adjustment and does nothing to explain how to correctly make the adjustment. – Dan Apr 04 '19 at 16:26
You need to pull the equation from that paper out and write it in your answer, contextualized to this problem. Otherwise you've just provided a link which is not an acceptable form of answer on stack exchange. – Dan Apr 04 '19 at 16:28
no , no you are right, we should come to right solution but u dont need a method. This is pure intuition based, If you can think. a as independent has 1,0 as outcome. if you have just 2 rows with 0, 1 so probability of both events is .5, if you over sample class 1 by adding 1 more row, probabilities of getting 1 will be 2/3 which is indeed inflated. isn't it? – Arpit Sisodia Apr 04 '19 at 16:33
sure.. will update answer now. – Arpit Sisodia Apr 04 '19 at 16:34

The effect of oversampling on the positive predictive value

1 Answers1