1

I just wondered how sampling methods work in imbalanced cases. So, I used imbalanced dataset(almost, 99:1) and logistic regression for Binary Classification. And the results are as below.

Now, I have some questions about that.

  1. Why $Pr(Y=1)$ are can be different in the same independent values?
  2. As a result, Model(origin) can say "2nd data's class is 1 with 1.5% chance" and Model(using STOMETomek) can say "2nd data's class is 1 with 70% chance". Am I right?

enter image description here

  • Apparently, the coefficients after sampling estimate the same population parameters, except for the intercept term, which shifts to match the prior distribution: https://stats.stackexchange.com/q/67903/232706 – Ben Reiniger Feb 01 '22 at 15:11

1 Answers1

1

Think in terms of Bayes’ theorem.

$$ P(Y=1\vert Features) =\dfrac{ P(Features\vert Y=1)P(Y=1) }{ P(Features) } $$

If you change the $P(Y=1)$ prior probability in the numerator, of course the $P(Y=1\vert Features)$ posterior probability changes.

It gets complicated because the other terms also change, but I do think this gives intuition about why the posterior probability depends on the prior probability.

2)

You’re interpreting the posterior probability correctly, yes, but you’re messing with the data, so you’re tricking yourself into thinking the $70\%$ chance is correct. In other words, you have the correct interpretation of a misleading result.

EDIT

Let's see an example where we change the prior probability, yet the changes to the other components are not enough to offset that difference.

$$ X = (0, 0, 1, 1)\\ Y = (0, 0, 0, 1)\\ P(Y=1\vert X=1) = \dfrac{P(X=1\vert Y=1)P(Y=1)} { P(X=1) }\\ =\dfrac{ 1\times \frac{1}{4} }{ \frac{1}{2} } \\=\dfrac{1}{2} $$

Now let's upsample the minority class so that $P(Y=1)=0.5$.

$$ X = (0, 0, 1, 1, 1, 1)\\ Y = (0, 0, 0, 1, 1, 1)\\ P(Y=1\vert X=1) = \dfrac{P(X=1\vert Y=1)P(Y=1)} { P(X=1) }\\ =\dfrac{ 1\times \frac{1}{2} }{ \frac{2}{3} } \\=\dfrac{3}{4} $$

If we do something SMOTE-like and have slightly different $X$ values for the synthetic $Y$, this basic example does result in an unchanged posterior probability.

$$ X = (0, 0, 1, 1, 1.1, 1.2)\\ Y = (0, 0, 0, 1, 1, 1)\\ P(Y=1\vert X=1) = \dfrac{P(X=1\vert Y=1)P(Y=1)} { P(X=1) }\\ =\dfrac{ \frac{1}{3}\times \frac{1}{2} }{ \frac{1}{3} } \\=\dfrac{1}{2} $$

However, machine learning models are not quite this literal with $P(Features\vert Y=1)$ and $P(Features)$ (there is interpolation, and we want this for continuous data), and we see the result in an example like there is in the OP where the posterior probability is much higher after SMOTE. Nonetheless, the former example shows how the changes in the prior probability need not be offset by changes to the other terms.

Dave
  • 28,473
  • 4
  • 52
  • 104