For the Logistic model, why is the objective function unbounded below if two sets are linearly seperated?

Question

I am reading Approximate linear discrimination via logistic modeling in the Section 8.6.1 of B & V's Convex Optimization book. On Page 428,

$$ \operatorname{minimize} \ -l(a, b) \tag{8.27} $$

with variables $a$, $b$, where $l$ is the log-likelihood function

$$ \begin{aligned} &l(a, b)=\sum_{i=1}^{N}\left(a^{T} x_{i}-b\right)-\sum_{i=1}^{N} \log \left(1+\exp \left(a^{T} x_{i}-b\right)\right)-\sum_{i=1}^{M} \log \left(1+\exp \left(a^{T} y_{i}-b\right)\right) \end{aligned} $$

It says that if two sets can be linearly seperated, i.e., if there exist $a$, $b$ with $a^T x_i > b$ and $a^T y_i < b$, then the optimization problem (8.27) is unbounded below. Why is it unbounded below for this case?

We already have many interesting discussions on this topic. https://stats.stackexchange.com/questions/254124/why-does-logistic-regression-become-unstable-when-classes-are-well-separated https://stats.stackexchange.com/questions/239928/is-there-any-intuitive-explanation-of-why-logistic-regression-will-not-work-forv https://stats.stackexchange.com/questions/5354/logistic-regression-model-does-not-converge?rq=1 — Haitao Du, Oct 18 '21 at 13:06

score 3 · Answer 1 · edited Oct 18 '21 at 14:40

Here's an answer that takes a look at what actually happens to the fit. Consider the example of $x \in \mathbb{R}$, such that the data is linearly separable at $x=0$. We model the data using linear regression without an intercept, i.e.:

$y = \frac{1}{1 + \exp(x b)}$.

Now as you can see in the plot above, as b increases, the fit gets increasingly close to the true data (black points). Hence, the best fit would be for infinite b, and thus the problem is unbounded.

score 1 · Answer 2 · answered Oct 18 '21 at 14:38

Think about the simplest case where there is a single predictor that is binary, i.e., you are using the binary logistic model to compute two proportions. If P(Y=1 | X=0) = 1 and P(Y=1 | X=1) = 0 you have perfect separation. The estimated proportions are 1.0 and 0.0 and there is no problem with the log likelihood function, which is zero.

For the Logistic model, why is the objective function unbounded below if two sets are linearly seperated?

2 Answers2