1

I am looking to verify if a method I am using is correct and valid.

I am using the following method to test the linearity assumption of logistic regression. As background, I am using python and the sklearn library in particular.

What I have done is:

  1. Built my model and fitted it to some training data
  2. Sampled 100 evenly spaced points in between the min and max of my independent variable X and calculated the probability of these points predicted by my model (using the predict_proba function)
  3. Plotting the sampled X points against the logit of the probability and observed that there is a linear relationship (note that predict_proba does return the probability of the samples belonging to each class so I just picked one of the classes)

Doing this makes me believe that I have not violated the assumption but am I unknowingly already assuming linearity in this method? Is this method valid?

I know similar questions have been asked before but those revolve around using the Box Tidwell method which is not something I would prefer to do. Rather, I simply want to test the assumption directly by graphing the logit of the response and the independent and observing if a linear relationship exists. So in this sense I believe my question to be different. (I am just specifying this so that my post doesn't keep getting removed)

Thanks! :)

pche3675
  • 13
  • 3
  • It seems to me that you are guaranteed to find a linear relation here because the model is linear in the logits. So no, this will not work. You need a model-free way to estimate the logits, eg, by using subsets of data and calculating empirical probabilities. Then convert to logits and compare to the model-based logits to test the assumption. – BigBendRegion Dec 01 '20 at 23:33
  • Hi, thanks for the response! I had thought something like this would be the case which is why I wanted to double check. If I were to go down the Box Tidwell method by doing something similar to what is mentioned here (https://stats.stackexchange.com/questions/217471/why-does-including-x-lnx-interaction-term-in-logistic-regression-model-helps?rq=1) am I right in understanding that should a_j - 1 ~ 0 then the linear assumption is not violated? – pche3675 Dec 01 '20 at 23:55

0 Answers0