2

I don't entirely understand why logistic regression is based on an exponential function.

The sigmoid function seems to assume that as the dependent variables increase, the independent variable starts to increase exponentially. How do we know this is true? How do we know that this isn't a linear relationship?

bugsyb
  • 491
  • 1
  • 5
  • 13
  • 1
    https://stats.stackexchange.com/questions/88603/why-is-logistic-regression-a-linear-model?rq=1 – Sycorax Aug 03 '17 at 01:37
  • 7
    Note that the model is for a probability $P(Y=1|\mathbf{X}=\mathbf{x})$ which must lie between 0 and 1. A linear function must eventually pass outside those limits, giving impossible probabilities. The logit is an example of a link function - by far the most popular - that stays within those limits. There are a variety of other functions that can be used (and indeed, both probit and complementary-log-log links are sometimes used instead of the logit link). – Glen_b Aug 03 '17 at 02:06
  • @Glen_b nice concise explanation. I have to disagree with the logit being by far the most popular claim, though. In econometrics we usually resort to Probit link function. I think, that's because of the easier interpretability of the regression coefficients – KenHBS Aug 03 '17 at 09:47
  • 2
    @Ken econometricians do use the probit a lot (and if I was talking only about the work done in econometrics you'd have a point), but econometricians are a small fraction of the people doing this kind of modelling. In applications across all areas I'd have guessed it would be over 90% logistic and less than 10% everything else put together (which would be almost all probit). [For questions here it's about 80/20 (logistic + logit to probit) but a somewhat higher proportion of new logistic questions would be closed as duplicates than would be the case for probit questions.] – Glen_b Aug 03 '17 at 09:49
  • @Glen_b I understand that logit allows the function to stay within the 0-1 limits. However, that still doesn't answer my original question. How do we know that the independent variable increases exponentially, and why is it safe to assume that? – bugsyb Aug 03 '17 at 23:26
  • Logistic is not exponential (except approximately, when the proportion is very small). If you read my first comment carefully, I say it's *an example* of a function that satisfies the conditions we need. While convenient from several viewpoints, it's no more safe to assume it's specifically logistic than to assume it's anything else that satisfies the same restrictions, outside of the fact that it's often a reasonable approximation -- as the famous saying goes, *all models are wrong, but some are useful*. It makes about as much sense as to assume a relationship is linear in ordinary regression – Glen_b Aug 04 '17 at 00:02
  • The premise of this question is unclear (& almost certainly false). For one thing, you seem to have "dependent" & "independent" switched relative to their conventional usage. The *dependent* variable is a function of the independent variables, not vice versa. Eg, you cannot say that 'as the DV increases, something happens to the IVs'. In addition, the DV does not increase (decrease) exponentially as a function of increases (decreases) in the IVs. – gung - Reinstate Monica Aug 04 '17 at 00:40

1 Answers1

6

[I didn't answer this before because I thought it would be a duplicate but I didn't locate a suitable one; I'll base a brief answer off my comments, at least until such a thread is located.]

Note that the model is for a probability $P(Y=1|\mathbf{X}=\mathbf{x})$ which must lie between 0 and 1.

A linear function must eventually pass outside those limits, giving impossible probabilities. That's usually undesirable.

The logit is an example of a link function - the most popular - that stays within those limits.

There are a variety of other functions that can be used for example, both probit and complementary-log-log links are sometimes used instead of the logit link for conditionally binomial models.

While convenient from several viewpoints, it's no more safe to assume it's specifically logistic than to assume it's anything else that satisfies the same restrictions, outside of the fact that it's often a reasonable approximation$-$as the famous saying goes, all models are wrong, but some are useful.

It makes about as much sense as to assume a relationship is linear in ordinary regression. It's convenient and often a good approximation, but typically it would be unwise to think the relationships are actually exactly linear.

Glen_b
  • 257,508
  • 32
  • 553
  • 939