I am conducting a logistic regression with a binary outcome (start and not start). My mix of predictors are all either continuous or dichotomous variables.
Using the Box-Tidwell approach, one of my continuous predictors potentially violates the assumption of linearity of the logit. There is no indication from goodness-of-fit statistics that fit is problematic.
I have subsequently run the regression model again, substituting the original continuous variable with: firstly, a square root transformation and secondly, a dichotomous version of the variable.
On inspection of the output, it seems that goodness-of-fit improves marginally but residuals become problematic. Parameter estimates, standard errors, and $\exp(\beta)$ remain relatively similar. The interpretation of the data does not change in terms of my hypothesis, across the 3 models.
Therefore, in terms of usefulness of my results and sense of interpretation of data, it seems appropriate to report the regression model using the original continuous variable.
I am wondering this:
- When is logistic regression robust against the potential violation of the linearity of logit assumption?
- Given my above example, does it seem acceptable to include the original continuous variable in the model?
- Are there any references or guides out there for recommending when it is satisfactory to accept that the model is robust against the potential violation of linearity of the logit?