5

In logistic regression, we model the posterior probability $P(y=1 | x)$ with the help of a sigmoidal function: $$P(y=1 | x) = \frac{1}{1+\exp(-x)} = h(x)$$

If we classify a datapoint to the class y = 1 if $h(x) > 0.5$, is then our classification Bayes-optimal, since it chooses to classify according to the higher posterior class probability? Is this connected to the fact that the minimizer of log loss is $\ln\frac{h(x)}{1-h(x)}?$

Pugl
  • 951
  • 1
  • 16
  • 40
  • I do not understand the question. Let us consider $x \in \mathbb{R}^2$. Take any function $g(y,x)$ such that $g(0,x) + g(1,x) = 1$ and any arbitrary density $f_X$ for $x$ then $(x,y) \mapsto g(y,x) f_X(x)$ defines a common density for which the conditional density $f_{Y|X}(y|x)$ is precisely $g(y,x)$. Taking $h(x) = g(1,x)$ and a predictor $p(x) = 1$ iff. $h(x) > 0.5$ *always* yields a Bayes optimal classifier (see http://www.win.tue.nl/~rmcastro/2DI70/files/2DI70_Lecture_Notes.pdf p. 16). So: what restrictions do you expect? The answer is just: if $f_{Y|X}$ really is the sigmoidal function... – Fabian Werner Feb 13 '18 at 15:10
  • then the logistic regression is Bayes optimal. – Fabian Werner Feb 13 '18 at 15:10

2 Answers2

2

The key question lies in modelling versus knowing the true law.

Assume your data obbeys an unknown perfect law $P(y=1|x)=f(x)$. Then the Bayes optimal classifier is "classify y=1 when $f(x)>0.5$". This is true for any law and not related to anything algebraic. In practice you don't know $f$ and you can't, so that the Bayes-optimal classifier is only a theoretical object.

Now, imagine you don't know $f$ but you know that $f(x)=logit^{-1}(\beta x)$ and only ignore $\beta$. This happens only in simulations where you control the underlying true law and hide $\beta$. You estimate it as $\hat\beta$ and you say "classify y=1 when $logit^{-1}(\hat\beta x)>0.5$". This is not Bayes optimal since don't have the exact $\beta$. It is asymptotically Bayes optimal since with infinite training data $\hat\beta=\beta$.

But in a real situation, logistic regression in only a guess for the unknown law and it's always false. You not only ignore the parameter, you also ignore how much logistic regression is a good approximation for the true unknown law. Then logistic regression predictor is not Bayes optimal. Not even asymptotically. Worse: you can't known how far it is to optimality.

There is a case where you can measure this: simulate data with an $f$ that is not logistic and see how good the logistic approximation is. This is not a real situation though.

Benoit Sanchez
  • 7,377
  • 21
  • 43
0

I think one could construct an example when the logistic regression is asymptotically Bayes-optimal (i.e., it minimises the expected 0/1 loss). One way to do this would be to consider a domain with two balanced (i.e., with equal marginal probabilities) normally distributed classes with the same covariance matrix. In this case, the logistic regression would learn the same classifier as LDA (linear discrimination analysis) which is asymptotically Bayes-optimal in this domain (this follows from Theorem 22.7 in L. Wasserman, All of Statistics).

D.M.
  • 21
  • 2