Investigating robustness of logistic regression against violation of linearity of logit

Question

I am conducting a logistic regression with a binary outcome (start and not start). My mix of predictors are all either continuous or dichotomous variables.

Using the Box-Tidwell approach, one of my continuous predictors potentially violates the assumption of linearity of the logit. There is no indication from goodness-of-fit statistics that fit is problematic.

I have subsequently run the regression model again, substituting the original continuous variable with: firstly, a square root transformation and secondly, a dichotomous version of the variable.

On inspection of the output, it seems that goodness-of-fit improves marginally but residuals become problematic. Parameter estimates, standard errors, and $\exp(\beta)$ remain relatively similar. The interpretation of the data does not change in terms of my hypothesis, across the 3 models.

Therefore, in terms of usefulness of my results and sense of interpretation of data, it seems appropriate to report the regression model using the original continuous variable.

I am wondering this:

When is logistic regression robust against the potential violation of the linearity of logit assumption?
Given my above example, does it seem acceptable to include the original continuous variable in the model?
Are there any references or guides out there for recommending when it is satisfactory to accept that the model is robust against the potential violation of linearity of the logit?

score 17 · Answer 1 · answered Jun 30 '13 at 12:43

The linearity assumption is so commonly violated in regression that it should be called a surprise rather than an assumption. Like other regression models, the logistic model is not robust to nonlinearity when you falsely assume linearity. Rather than detect nonlinearity using residuals or omnibus goodness of fit tests, it is better to use direct tests. For example, expand continuous predictors using regression splines and do a composite test of all the nonlinear terms. Better still don't test the terms and just expect nonlinearity. This approach is much better than trying different single-slope choices of transformations such as square root, log, etc., because statistical inference arise after such analyses will be incorrect because it does not have large enough numerator degrees of freedom.

Here's an example in R.

require(rms)
f <- lrm(y ~ rcs(age,4) + rcs(blood.pressure,5) + sex + rcs(height,4))
# Fits restricted cubic splines in 3 variables with default knots
# 4, 5, 4 knots = 2, 3, 2 nonlinear terms
Function(f)   # display algebraic form of fit
anova(f)      # obtain individual + combined linearity tests

Your answer makes fantastic sense - thank you! Could you suggest syntax to be used in SPSS? I unfortunately do not have access (or skills) to utilise R. — Short Elizabeth, Jun 30 '13 at 13:09
It is definitely worth the time to learn R, and I have lots of handouts related to logistic modeling and the rms package. This would be hard to do in SPSS. — Frank Harrell, Jun 30 '13 at 14:13
(+1). Is there a data set that has been loaded off screen before this code was pasted (i.e. one with the variables `y, age, blood.pressure, sex` ) or was this just meant to be pseudocode? — Macro, Jul 10 '13 at 23:32
The examples built-in to the software's help pages simulates such data, so look at the entire example in context. Do `require(rms)` then `?lrm` then `examples(lrm)` — Frank Harrell, Jul 11 '13 at 11:23

Investigating robustness of logistic regression against violation of linearity of logit

1 Answers1

Linked