Theory: Logistic Regression MLE does not exist when data are perfectly separable?

Question

I realize that this is a relatively well published/understood concept, but I'm unfamiliar with how the conclusion is derived. Could someone ELI5 why when the data are perfectly separable, the MLE for a logistic regression $\hat{\beta} = \infty$?

@MichaelChernick http://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression — SmallChess, Apr 12 '17 at 03:52

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

Imagine you have two data points:

$x_1 = 1$, $y_1= 1$
$x_2 = -1$, $y_2 = 0$

And let's say we're estimating $b$ in $P(Y= 1 \mid X = x) = \frac{e^{bx}}{1 + e^{bx}}$

How does $b=5$ do? Let $P_1$ and $P_2$ be the probability of correctly forecasting the 1st and 2nd point respectively. We have:

$P_2 = P_1 = \frac{e^5}{1+e^5} = .9933$

Can we do better? Sure. How about $b=6$

$P_2 = P_1 = \frac{e^6}{1+e^6} = .9975$

With perfect separation, you always have a higher likelihood by increasing $b$ so the maximum likelihood estimate (MLE) doesn't exist. In a sense, the MLE is $b = \infty$. See the nifty graphic on this answer: https://stats.stackexchange.com/a/224864/97925

Theory: Logistic Regression MLE does not exist when data are perfectly separable?

1 Answers1