Oversampling/undersampling issue

Asked Jul 07 '21 at 15:05

Active Jul 07 '21 at 15:05

Viewed 31 times

Assume original data contains 1000 goods and 1 bad I build a logistic regression and use the the model to score the bad and I get probability = 0.00001 Then I use oversampling/undersampling to increase/decrease the original data so now I have 1000 goods and 1000 bags if I use oversampling. Then I build a logistic model use the data and apply the model to the original data then for that bad I get probability = 0.5. However this probability need to be adjusted to reflect original data so after doing some math you get adjusted probability lower than 0.5 (for example 0.00001 ) so what is the point of oversampling/undersampling if you are required to adjust the probability?

asked Jul 07 '21 at 15:05

gyambqt

3

The point of oversampling is unclear to many. See the discussion in the comments to [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Jul 07 '21 at 15:18
Still don’t understand – gyambqt Jul 07 '21 at 22:39
1

It would help to explain what it is you do not understand. – Dave Jul 07 '21 at 23:54
My question is asking why do you need to adjust probability after – gyambqt Jul 08 '21 at 11:19
1

I think @StephanKolassa will agree with me that you do not need to do that, because you should not be oversampling or undersampling in the first place. I find it hard to argue for a correct conclusion to a wrong procedure. – Dave Jul 08 '21 at 11:51

Oversampling/undersampling issue

0 Answers0

Linked