I am looking to verify if a method I am using is correct and valid.
I am using the following method to test the linearity assumption of logistic regression. As background, I am using python and the sklearn library in particular.
What I have done is:
- Built my model and fitted it to some training data
- Sampled 100 evenly spaced points in between the min and max of my independent variable X and calculated the probability of these points predicted by my model (using the predict_proba function)
- Plotting the sampled X points against the logit of the probability and observed that there is a linear relationship (note that predict_proba does return the probability of the samples belonging to each class so I just picked one of the classes)
Doing this makes me believe that I have not violated the assumption but am I unknowingly already assuming linearity in this method? Is this method valid?
I know similar questions have been asked before but those revolve around using the Box Tidwell method which is not something I would prefer to do. Rather, I simply want to test the assumption directly by graphing the logit of the response and the independent and observing if a linear relationship exists. So in this sense I believe my question to be different. (I am just specifying this so that my post doesn't keep getting removed)
Thanks! :)