2

I am reading Approximate linear discrimination via logistic modeling in the Section 8.6.1 of B & V's Convex Optimization book. On Page 428,

$$ \operatorname{minimize} \ -l(a, b) \tag{8.27} $$

with variables $a$, $b$, where $l$ is the log-likelihood function

$$ \begin{aligned} &l(a, b)=\sum_{i=1}^{N}\left(a^{T} x_{i}-b\right)-\sum_{i=1}^{N} \log \left(1+\exp \left(a^{T} x_{i}-b\right)\right)-\sum_{i=1}^{M} \log \left(1+\exp \left(a^{T} y_{i}-b\right)\right) \end{aligned} $$

It says that if two sets can be linearly seperated, i.e., if there exist $a$, $b$ with $a^T x_i > b$ and $a^T y_i < b$, then the optimization problem (8.27) is unbounded below. Why is it unbounded below for this case?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
suineg
  • 23
  • 4
  • We already have many interesting discussions on this topic. https://stats.stackexchange.com/questions/254124/why-does-logistic-regression-become-unstable-when-classes-are-well-separated https://stats.stackexchange.com/questions/239928/is-there-any-intuitive-explanation-of-why-logistic-regression-will-not-work-forv https://stats.stackexchange.com/questions/5354/logistic-regression-model-does-not-converge?rq=1 – Haitao Du Oct 18 '21 at 13:06

2 Answers2

3

Fitted linear regression for increasing values of b

Here's an answer that takes a look at what actually happens to the fit. Consider the example of $x \in \mathbb{R}$, such that the data is linearly separable at $x=0$. We model the data using linear regression without an intercept, i.e.:

$y = \frac{1}{1 + \exp(x b)}$.

Now as you can see in the plot above, as b increases, the fit gets increasingly close to the true data (black points). Hence, the best fit would be for infinite b, and thus the problem is unbounded.

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47
drmaettu
  • 61
  • 4
1

Think about the simplest case where there is a single predictor that is binary, i.e., you are using the binary logistic model to compute two proportions. If P(Y=1 | X=0) = 1 and P(Y=1 | X=1) = 0 you have perfect separation. The estimated proportions are 1.0 and 0.0 and there is no problem with the log likelihood function, which is zero.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322