2

I have built a logistic regression model that has many continuous and categorical (coded as dummy) variables. As per my understanding, categorical variables after being encoded to dummy form hold linearity by definition they just have two points (1 and 0).

For continuous, I then ran logit linearity test by adding interaction terms (as illustrated in the DSUR book). All my continuous predictors appear to be violating the linearity assumption.

What are the options, apart from discarding the variables, do I have here?

Thanks.

1 Answers1

1

I'm not familiar with the DSUR book but I assume they are plotting the numeric variables against logit(p). You can try some of the same things you would do with a regular linear regression (ie. test suitable transformations to improve linearity) but beware, this can create pretty messy interpretations of coefficients.

  • I didn't realize the interpretation is going to be different. Btw, how safe is it to build a model with assumption violated apart from becoming less generalized? – Shahzeb Naveed Apr 19 '20 at 03:16
  • Probably depends on how non-linear and your sample size. Bergtold et al. has a good article on this https://www.tandfonline.com/doi/abs/10.1080/02664763.2017.1282441 – Lyndon Walker Apr 19 '20 at 03:22
  • Thanks. Also, is it fine to test logit linearity of continuous variables with the categorical predictors excluded (just for testing purposes)? – Shahzeb Naveed Apr 19 '20 at 04:25