3

I need to calculate the positive predictive value for a validation set for a rare event. The problem is that the validation set was oversampled for the rare event. The event occurs in 5 percent of the population, however the oversampling has adjusted it to be in 50 percent of the sample.

How does the oversampling effect the calculation of the ppv?

user43856
  • 31
  • 1
  • This article explains that you need to adjust the odds (not the probabilities) by the fraction your oversample by: https://yiminwu.wordpress.com/2013/12/03/how-to-undo-oversampling-explained/. – Dan Apr 04 '19 at 15:23

1 Answers1

-1

Yes, probabilities are inflated now because of oversampling. you can divide predictions by 10 as you had 10 folded the positive class. or there are different ways of re- calibrating probabilities like -

https://quinonero.net/Publications/predicting-clicks-facebook.pdf

( section 6.3 -model re calibration)

Arpit Sisodia
  • 1,029
  • 2
  • 7
  • 23
  • Do you have a reference for this? This would mean the highest rating the classifier could provide would be 0.1 which doesn't really make sense to me. – Dan Apr 04 '19 at 14:06
  • sure- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf – Arpit Sisodia Apr 04 '19 at 15:08
  • go to section 7- calibrating predictions. – Arpit Sisodia Apr 04 '19 at 15:08
  • That article suggests using isotonic regression to calibrate models. It certainly does not suggest dividing your predictions by a constant and also seems unrelated to oversampling in preprocessing. – Dan Apr 04 '19 at 15:16
  • For example: https://stats.stackexchange.com/a/257507/40604 – Dan Apr 04 '19 at 15:20
  • Hey @Dan, yes you are correct that it shouldn't be divided directly by 10. But recalibation of probabilities are required. I have done lot of research already. Read the another Facebook click prediction - https://quinonero.net/Publications/predicting-clicks-facebook.pdf ( if you go to recalculation of predictions section ) . – Arpit Sisodia Apr 04 '19 at 16:16
  • then as it stands, your answer is incorrect (and misleading). I suggest you edit it or delete it. – Dan Apr 04 '19 at 16:19
  • edited , but yes, probabilities are inflated and must be re-calibrated. – Arpit Sisodia Apr 04 '19 at 16:21
  • No one is arguing otherwise. But your answer contains an entirely incorrect method for the adjustment and does nothing to explain how to correctly make the adjustment. – Dan Apr 04 '19 at 16:26
  • You need to pull the equation from that paper out and write it in your answer, contextualized to this problem. Otherwise you've just provided a link which is not an acceptable form of answer on stack exchange. – Dan Apr 04 '19 at 16:28
  • no , no you are right, we should come to right solution but u dont need a method. This is pure intuition based, If you can think. a as independent has 1,0 as outcome. if you have just 2 rows with 0, 1 so probability of both events is .5, if you over sample class 1 by adding 1 more row, probabilities of getting 1 will be 2/3 which is indeed inflated. isn't it? – Arpit Sisodia Apr 04 '19 at 16:33
  • sure.. will update answer now. – Arpit Sisodia Apr 04 '19 at 16:34