4

With this synthetic data set (the relationship between survival/death and the factor x) (plotted in the below figure as blue points), I would like to know how the survival probability depends on the factor x. I don't think logistic regression is the right tool for this data set because I think it can only give a monotonic function as its estimation while for this synthetic data set, I expect a different relationship (the red line in the below figure is my expectation). I wonder what is the best statistical tool here? generalized additive model?

Data points are from an synthetic data set. And the red line is expected to be the reasonable statistical model

Tanis
  • 41
  • 2
  • 3
    If you have information about time to death and not just the binary dead/alive classification, consider using survival analysis instead. Like a logistic regression, a Cox proportional hazards regression can also incorporate splines of continuous predictor variables as noted in the answer by @gung, for example via the `rms` package in R. – EdM Sep 27 '18 at 20:51

1 Answers1

6

Logistic regression can very well model 'curvilinear' relationships, just as linear regression can. You need to add extra terms, functions of x to allow the model to account for that. The most common way is to add a sequence of polynomial terms (i.e., $x^2$, $x^3$, $x^4$, etc.). You can also use other nonlinear transformations of $x$ (e.g., $\log(x)$). A more sophisticated approach is to use spline functions.

There is an example of using logistic regression this way in my answer here: How to use boxplots to find the point where values are more likely to come from different conditions?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thanks for your reply. In this sense, is additive model a more convenient tool? I'm not familiar with it but I guess additive model can provide a more convenient way to add nonlinearity to the model? – Tanis Sep 27 '18 at 19:57
  • @Tanis, what do you mean by "additive model" here? A simple model w/ x, x2, & x3, could well be called an additive model. – gung - Reinstate Monica Sep 27 '18 at 19:59
  • 1
    I have this link to its wikipedia page. https://en.wikipedia.org/wiki/Generalized_additive_model – Tanis Sep 27 '18 at 20:01
  • @Tanis, OK, a GAM isn't quite the same as the generic use of "additive model". At any rate, you can think of a logistic regression with polynomial terms as a simple case of a GAM. Whether it's "more convenient" would only be a function of your relative comfort w/ the code. – gung - Reinstate Monica Sep 27 '18 at 20:10
  • There is one advantage of using regression splines in place og gam's (as in R's package `mgcv`): logistic regression with splines is a standard generalized linear model, so standard inference tools can be used. gam's on the other hand need special inference theory (and special software). – kjetil b halvorsen Sep 28 '18 at 09:14