3

I come to a statement that logistic regression is a non linear problem. How can one show this?

Is it possible to treat logistic discrimination in terms of equivalent linear regression problem?

ironman
  • 652
  • 7
  • 23
  • 3
    "Nonlinear problem" and "nonlinear regression" are completely different things! Which of these two questions are you trying to ask? – whuber Sep 04 '18 at 17:54
  • Question is " show that logistic regression is a non linear regression problem". – ironman Sep 04 '18 at 18:22
  • Can you write out a logistic regression function, e.g., $p(y=1) = \dots$? – jbowman Sep 04 '18 at 19:33
  • 5
    Allow me to elaborate. Logistic regression is *linear* in the general sense I initially describe in an answer at https://stats.stackexchange.com/questions/148638. Briefly, it is the archetype of a *generalized linear model* (GLM). However, *estimating its parameters* is a *nonlinear optimization problem.* In which sense do you mean "non linear problem"? And what do you mean by "logistic discrimination" and what sense of "equivalent" might you have in mind in your second question? – whuber Sep 04 '18 at 20:39
  • @whuber *generalized* linear model, not a general linear model, right? The usual use of the term "general linear model" does not include logistic regression ([see here](https://en.wikipedia.org/wiki/Generalized_linear_model), noting in particular the "not to be confused with" part). Logistic regression is usually categorized as non-linear regression due to the non-linearity introduced by the link function. – guy Sep 04 '18 at 22:52
  • @guy yes, thank you for catching that. I have edited the comment for clarification. Although one is free to characterize logistic regression as "nonlinear" because the link often has a nonlinear expression, the entire point of a GLM is that the link function *linearizes* the model. Indeed, logistic regression posits a linear relationship between the log odds (that is, the logit of the Poisson parameter) and the explanatory variables, so arguably it is "nonlinear" only when one chooses the wrong way to parameterize the Poisson family! – whuber Sep 05 '18 at 11:32
  • @whuber vis-a-vis GLMs being “linear models”, I’m not making a judgement on the merit of categorizing them as linear, I’m just reporting how I’ve always seen them categorized: GLMs generalize the linear model, and are not linear models themselves, and linear models are *defined* to mean “the mean is linear in the parameters.” – guy Sep 05 '18 at 12:44
  • 1
    @Guy That's fine--but neither I nor the OP ever used "linear model" in that sense; despite the typographical error in my comment, which you kindly pointed out and I corrected, I have been discussing GLMs all along and provided a link to clarify what that was intended to mean. – whuber Sep 05 '18 at 12:47

2 Answers2

8

Recall that the Logistic regression model is a non linear transformation of $\beta^Tx$

  • Probability of $(Y = 1)$: $p = \frac{e^{\alpha + \beta_1x_1 + \beta_2 x_2}}{1 + e^{ \alpha + \beta_1x_1 + \beta_2 x_2}}$
  • Odds of $(Y = 1)$: $ \left( \frac{p}{1-p}\right) = e^{\alpha + \beta_1x_1 + \beta_2 x_2}$
  • Log Odds of $(Y = 1)$: $ \log \left( \frac{p}{1-p}\right) = \alpha + \beta_1x_1 + \beta_2 x_2$

So to answer your question, Logistic regression is indeed non linear in terms of Odds and Probability, however it is linear in terms of Log Odds.


A simple example

Fitting a logistic regression model on the following toy example gives the coefficients $\alpha = -5.05$ and $\beta = 1.3$

Plotting the probability $P(Y=1)$ as a function of $X$ clearly shows the non linear relationship

enter image description here

The Odds of $Y$ being 1 given $X$ is also non linear

enter image description here

Finally the log odds of $Y$ being 1 is a linear relationship

enter image description here

See here for some more details: Calculating confidence intervals for a logistic regression

Xavier Bourret Sicotte
  • 7,986
  • 3
  • 40
  • 72
1

For the first statement: logistic regression is used when a variable is dichotomous. Since the variable can assume only value 1 or 0, fitting a line assumes a linear relationship which cannot hold for dichotomous outcomes. It can be proved that the linear probability model will not be efficient and, furthermore, nothing ensures that the estimated dependent variable will be bounded between 0 and 1. The logit can solve these problem.

Please clarify your second statement.