14

One of the assumption of logistic regression is the linearity in the logit. So once I got my model up and running I test for nonlinearity using Box-Tidwell test. One of my continuous predictors (X) has tested positive for nonlinearity. What am I suppose to do next?

As this is a violation of the assumptions shall I get rid of the predictor (X) or include the nonlinear transformation (X*X). Or transform the variable into a categorical? If you have a reference could you please point me to that too?

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
tosonb1
  • 321
  • 1
  • 2
  • 6

3 Answers3

9

I would suggest to use restricted cubic splines (rcs in R, see the Hmisc and Design packages for examples of use), instead of adding power of $X$ in your model. This approach is the one that is recommended by Frank Harrell, for instance, and you will find a nice illustration in his handouts (§2.5 and chap. 9) on Regression Modeling Strategies (see the companion website).

You can compare the results with your Box-Tidwell test by using the boxTidwell() in the car package.

Transforming continuous predictors into categorical ones is generally not a good idea, see e.g. Problems Caused by Categorizing Continuous Variables.

chl
  • 50,972
  • 18
  • 205
  • 364
6

It may be appropriate to include a nonlinear transformation of x, but probably not simply x × x, i.e x2. I believe you may find this a useful reference in determining which transformation to use:

G. E. P. Box and Paul W. Tidwell (1962). Transformation of the Independent Variables. Technometrics Volume 4 Number 4, pages 531-550. http://www.jstor.org/stable/1266288

Some consider the Box-Tidwell family of transformations to be more general than is often appropriate for interpretability and parsimony. Patrick Royston and Doug Altman introduced the term fractional polynomials for Box-Tidwell transformations with simple rational powers in an influential 1994 paper:

P. Royston and D. G. Altman (1994). Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. Applied Statistics Volume 43: pages 429–467. http://www.jstor.org/stable/2986270

Patrick Royston in particular has continued to work and publish both papers and software on this, culminating in a book with Willi Sauerbrei:

P. Royston and W. Sauerbrei (2008). Multivariable Model-building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Chichester, UK: Wiley. ISBN 978-0-470-02842-1

onestop
  • 16,816
  • 2
  • 53
  • 83
5

Don't forget to check for interactions between X and other independent variables. Leaving interactions unmodeled can make X look like it has a non-linear effect when it simply has a non-additive one.

conjugateprior
  • 19,431
  • 1
  • 55
  • 83
  • Good point. I've only come across the converse: assuming an effect is linear when it isn't can lead to spurious statistical evidence for multiplicative interaction terms. – onestop Oct 29 '10 at 12:23
  • 1
    @onestop, do you have a reference about that? I believe it, but I'm having trouble figuring out exactly why that would happen. – Macro May 15 '12 at 12:23