66

Since we are using the logistic function to transform a linear combination of the input into a non-linear output, how can logistic regression be considered a linear classifier?

Linear regression is just like a neural network without the hidden layer, so why are neural networks considered non-linear classifiers and logistic regression is linear?

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
Jack Twain
  • 7,781
  • 14
  • 48
  • 74
  • 9
    Transforming "a linear combination of the input into a non-linear output" is a basic part of the *definition* of a [Linear Classifier](http://en.wikipedia.org/wiki/Linear_classifier#Definition). That reduces this question to the second part, which amounts to demonstrating that Neural Networks cannot generally be expressed as linear classifiers. – whuber Apr 12 '14 at 19:51
  • 2
    @whuber: How do you explain the fact that a logistic regression model can take polynomial predictor variables (e.g. $w_1 \cdot x_1^2 + w_2 \cdot x_2^3$) to produce a non-linear decision boundary? Is that still a linear classifier? – stackoverflowuser2010 Jun 25 '16 at 22:33
  • 5
    @Stack The concept of "linear classifier" appears to originate with the concept of a *linear model.* "Linearity" in a model can take on several forms, as described at http://stats.stackexchange.com/a/148713. If we accept the [Wikipedia characterization of linear classifiers](https://en.wikipedia.org/wiki/Linear_classifier), then your polynomial example would be viewed as *nonlinear* in terms of the given "features" $x_1$ and $x_2$ but it would be *linear* in terms of the features $x_1^2$ and $x_2^3$. This distinction provides a useful way to exploit the properties of linearity. – whuber Jul 12 '16 at 13:12
  • 1
    I'm still a bit confused about the question is the decision boundary of a logistic classifier linear? I've followed the Andrew Ng machine learning course on Coursera and he mentioned the following: [![enter image description here](https://i.stack.imgur.com/gHxfr.png)](https://i.stack.imgur.com/gHxfr.png) So actually it seems to me there is no one answer it depends on the linearity or non-linearity of the decision boundary, that depends on the Hypothesis function defined as Htheta(X) where X is the input and Theta is the variables of our problem. Does it make sense for you? – brokensword Nov 30 '16 at 09:48

4 Answers4

55

Logistic regression is linear in the sense that the predictions can be written as $$ \hat{p} = \frac{1}{1 + e^{-\hat{\mu}}}, \text{ where } \hat{\mu} = \hat{\theta} \cdot x. $$ Thus, the prediction can be written in terms of $\hat{\mu}$, which is a linear function of $x$. (More precisely, the predicted log-odds is a linear function of $x$.)

Conversely, there is no way to summarize the output of a neural network in terms of a linear function of $x$, and that is why neural networks are called non-linear.

Also, for logistic regression, the decision boundary $\{x:\hat{p} = 0.5\}$ is linear: it's the solution to $\hat{\theta} \cdot x = 0$. The decision boundary of a neural network is in general not linear.

Stefan Wager
  • 2,233
  • 11
  • 7
  • 2
    You answer is the most clear and uncomplicated to me so far. But I'm a bit confused. Some people say that the predicated log-odds is a linear function of $x$ and others say it's a linear function of $\theta$. So?! – Jack Twain Apr 16 '14 at 16:56
  • 1
    then also by your explanation. Can we say that the predication of the neural network is a linear function of the last hidden layer's activations? – Jack Twain Apr 16 '14 at 17:01
  • 2
    The predicted log-odds $\hat{\theta} \cdot x$ is linear in both $\hat{\theta}$ and $x$. But usually we are most interested in the fact that the log-odds is linear in $x$, because this implies that the decision boundary is linear in $x$ space. – Stefan Wager Apr 16 '14 at 17:58
  • 1
    Sure - one way to think about neural nets is that the lower layers build a good feature representation, and then the top layer is a linear classifier. – Stefan Wager Apr 16 '14 at 18:00
  • Honestly I still find it not clear of why the 'odds'? why the log of the odds as a linear function of $x$ gives it the linear 'badge'? I mean if it was the probability of getting class-1 is a linear function of $x$, then it's straightforward for me and a 'clean' idea that it deserves the linear 'badge'. Do you understand what I mean? – Jack Twain Apr 16 '14 at 19:25
  • I can only think of it as a non-linear classifier, since the probability of getting class-1 is a nonlinear function of $x$. Don't you agree? – Jack Twain Apr 16 '14 at 19:28
  • 4
    I've been using the definition that a classifier is linear if its decision boundary is linear in $x$ space. This is not the same as the predicted probabilities being linear in $x$ (which would be impossible apart from trivial cases, since probabilities must lie between 0 and 1). – Stefan Wager Apr 16 '14 at 22:01
  • @StefanWager I understand that θ̂ ⋅x is a linear function in x obviously - but the log odds takes this term as the exponent of e - which is not a linear function - so the the function from the predictor x to the probability is NOT linear, so the prediction can NOT be written as a linear function of **x**, or not? – Pugl Dec 05 '15 at 15:06
  • 7
    @Pegah I know this is old, but: Logistic regression has a linear decision boundary. The ouptut itself is not linear of course, its logistic. Depending on which side of the line a point falls, the total output will approach (but never reach) 0 or 1 respectively. And to add to Stefan Wagners answer: The last sentence is not totally correct, a neural network is non-linear when it contains non-linear activations or ouput functions. But it can be linear as well (in case no non-linearities were added). – Chris Aug 14 '17 at 23:12
  • thanks nevertheless! I think I understand now that the mapping applied to the input is not linear, but the *decision* made is based on a linear function:) – Pugl Aug 15 '17 at 09:05
  • 2
    I still don't understand the difference between how logistic regression is linear, but NNs aren't. Can't you write the output of a neural network as $\hat p = w_1\sigma(x_1)+w_2\sigma(x_2)+w_3$, which is similar to how the OP of this answer wrote the output of logistic regression? Equivalently, $\hat p = w_1(\frac{1}{1+exp(-\theta^Tx_1)}) + w_2(\frac{1}{1+exp(-\theta^Tx_2)})+w_3\\$. So my confusion is that if the answer's equation is considered to be a linear function of x, what makes this not a linear function of x? – rasen58 Aug 21 '17 at 14:57
  • @rasen58 I would ask the same. You can express a logistic regression model using a one layer neural network with a logistic activation function. Also neural networks are not inherently non linear since you can use linear activation functions. – MattSt Nov 08 '20 at 13:28
28

As Stefan Wagner notes, the decision boundary for a logistic classifier is linear. (The classifier needs the inputs to be linearly separable.) I wanted to expand on the math for this in case it's not obvious.

The decision boundary is the set of x such that $${1 \over {1 + e^{-{\theta \cdot x}}}} = 0.5$$

A little bit of algebra shows that this is equivalent to $${1 = e^{-{\theta \cdot x}}}$$

and, taking the natural log of both sides,

$$0 = -\theta \cdot x = -\sum\limits_{i=0}^{n} \theta_i x_i$$

so the decision boundary is linear.

The reason the decision boundary for a neural network is not linear is because there are two layers of sigmoid functions in the neural network: one in each of the output nodes plus an additional sigmoid function to combine and threshold the results of each output node.

Phil Bogle
  • 381
  • 3
  • 3
  • 2
    Actually, you can get a non-linear decision boundary with only one layer having an activation. See the standard example of an XOR with a 2-layer feed-forward network. – James Hirschorn Mar 19 '18 at 18:48
  • 1
    Logistic regression is neither linear nor is it a classifier. The idea of a "decision boundary" has little to do with logistic regression, which is instead a direct probability estimation method that separates predictions from decision. – Frank Harrell Nov 18 '20 at 13:48
5

It we have two classes, $C_{0}$ and $C_{1}$, then we can express the conditional probability as, $$ P(C_{0}|x) = \frac{P(x|C_{0})P(C_{0})}{P(x)} $$ applying the Bayes' theorem, $$ P(C_{0}|x) = \frac{P(x|C_{0})P(C_{0})}{P(x|C_{0})P(C_{0})+P(x|C_{1})P(C_{1})} = \frac{1}{1+ \exp\left(-\log\frac{P(x|C_{0})}{P(x|C_{1})}-\log \frac{P(C_{0})}{P(C_{1})}\right)} $$ the denominator is expressed as $1+e^{\omega x}$.

Under which conditions reduces the first expression to a linear term?. If you consider the exponential family (a canonical form for the exponential distributions like Gauß or Poisson), $$ P(x|C_{i}) = \exp \left(\frac{\theta_{i} x -b(\theta_{i})}{a(\phi)}+c(x,\phi)\right) $$ then you end up having a linear form, $$ \log\frac{P(x|C_{0})}{P(x|C_{1})} = \left[ (\theta_{0}-\theta_{1})x - b(\theta_{0})+b(\theta_{1}) \right]/a(\phi) $$

Notice that we assume that both distributions belong to the same family and have the same dispersion parameters. But, under that assumption, the logistic regression can model the probabilities for the whole family of exponential distributions.

jpmuc
  • 12,986
  • 1
  • 34
  • 64
1

The key is that the logistic regression model is additive and outcome z depends on the additivity of the weight parameter values, e.g., :

z = w1x1 + w2x2

There’s no interaction between the weight parameter values, nothing like w1x1 * w2x2 or so, which would make our model non-linear!

  • This answer is directly taken from [https://sebastianraschka.com/faq/docs/logistic_regression_linear.html](https://sebastianraschka.com/faq/docs/logistic_regression_linear.html) without providing a reference. – helperFunction Oct 08 '21 at 06:47