Problems with zero values when testing for linearity of the logit

Question

From Field's 'Discovering Statistics using SPSS' I read that you should test for linearity of the logit when using a logistic regression. The process explained in the book covers the natural log transformation of the continous, independent variables, then introducing new interaction variables between independent variables and their log transformation and lastly checking for the significance of the new variables.

Unfortunately, there's nothing mentioned how to handle the case when you have zero values for some variables and you cant calculate the ln of 0. Anybody knows how to proceed in this case?

kjetil b halvorsen · Answer 1 · 2017-09-07T12:54:38.033

Rather than testing for linearity of the logit, as you say, why not model in a way that do not assume linearity of the logit at the outset? If you have a continuous predictor $x$, say, you can represent it in the model with a spline term.

Then, if you need a test, you can "separate the spline term in two", by having $x$ directly (linearly) in the model and then in addition a spline term only including the nonlinear part of the spline, then comparing the two models with a likelihood ratio test, that is, comparing deviances?

An example to make this more clear, in R:

library(Fahrmeir)   # Regensburg
library(splines)   # for ns()
library(MASS)  # for polr()

data(Regensburg)
Regensburg$y  <-  ordered(Regensburg$y)
mod1 <-  polr( y~age, data=Regensburg, weights=n)
mod2 <-  polr(y ~ ns(age, df=4), data=Regensburg,  weight=n)

> mod1
Call:
polr(formula = y ~ age, data = Regensburg, weights = n)

Coefficients:
      age 
0.2086214 

Intercepts:
     1|2      2|3 
2.921114 6.054559 

Residual Deviance: 179.539 
AIC: 185.539 
> mod2
Call:
polr(formula = y ~ ns(age, df = 4), data = Regensburg, weights = n)

Coefficients:
ns(age, df = 4)1 ns(age, df = 4)2 ns(age, df = 4)3 ns(age, df = 4)4 
       2.2128005        2.2441255        1.5563837        0.9054964 

Intercepts:
       1|2        2|3 
-0.9931522  2.3105193 

Residual Deviance: 173.9352 
AIC: 185.9352 

 anova(mod1, mod2)
Likelihood ratio tests of ordinal regression models

Response: y
            Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1             age        99   179.5390                                
2 ns(age, df = 4)        96   173.9352 1 vs 2     3  5.60387 0.1325563

Problems with zero values when testing for linearity of the logit

1 Answers1

Linked