2

I'm currently having trouble understanding the assumption of logistic regression that the input variables must be linearly related to the log odds. Specifically, what actually happens to the model when this assumption is violated? Ive read the mathematical explanations of the assumption, but it does not help my understanding of it. They leave me more confused.

I have been told that whether the assumption is true or not does not affect whether the data is linearly separable. So does a violation of the assumption cause the decision boundary to point in the wrong direction or something like that?

Why should I care whether the assumption is violated or not basically.

Ryan
  • 21
  • 1
  • 1
    You might look at some non-mathematical explanations for some intuitions. See e.g. https://stats.stackexchange.com/q/504448/16974. – James Stanley Dec 03 '21 at 02:13
  • 1
    You can also consider the issue by re-framing this from being about a model *assumption* to a modelling *process* -- that is, in a basic model you are modelling a linear association (this is equivalent to a fixed odds ratio for a one-unit change in your x variable). If the association is more complex (e.g. U-shaped association, or an initially incremental shape that then flattens out at higher levels of x) then you are going to have a model that doesn't fit your data (or over-simplifies the association, depending on your analytic paradigm). – James Stanley Dec 03 '21 at 02:14
  • The issue I have with that explanation is that I interpret it to imply the shape of the boundary between the classes. For linear regression, the data is spread along a line. I then apply that logic to logistic regression and conclude that violation of linearity implies the boundary between the classes is not a straight line, like in linear regression, the data would not be spread along a line if the linearity assumption didnt hold. However, I've been told that the assumption has no effect on the class boundary shape so this is why I'm confused. I don't understand what breaks in the model. – Ryan Dec 03 '21 at 02:30
  • I also read that link before and it didn't help, but I do appreciate the help. – Ryan Dec 03 '21 at 02:32
  • 1
    No problem! I don't think this is succinctly describable, but to address your comment above the linearity is not about the boundary but about the relation between continuous predictor x and binary outcome y. See if this page helps: you are effectively modelling (a logit transformation) of the proportion of instances with the outcome along the dimension of x, and that is where a single parameter specification dictates a linear function https://thestatsgeek.com/2014/09/13/checking-functional-form-in-logistic-regression-using-loess/ – James Stanley Dec 03 '21 at 02:38
  • I think the link helps a lot. So is this analagous to log transforming data on an exponential scale so that it can be modelled as a straight line? But in the case of logistic regression, the assumption is saying that we want to find a transformation of x (if necessary) so that x is a straight line when plotted against the logit? Then if this is true, I'll have a hard think about the implications. – Ryan Dec 03 '21 at 03:21

0 Answers0