0

I am largely inexperienced in the area of logistic regression. I was wondering if there were a good way to transform a continuous covariate in a logistic regression into a discrete one by subdividing the support of this continuous covariate in a smart way.

Any reference is welcome if the question is too general.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
TheBridge
  • 207
  • 2
  • 9
  • 2
    The good way is to not do this. Why do you want to? – Peter Flom Jun 10 '13 at 16:11
  • 2
    @PeterFlom is right. Although written in a different context, I discuss categorizing continuous variables here: [how-to-choose-between-anova-and-ancova-in-a-designed-experiment](http://stats.stackexchange.com/questions/24077//24080#24080), especially after the update. It may help you to read it. – gung - Reinstate Monica Jun 10 '13 at 16:15
  • 1
    @gung I think your answer is a little extreme (but I will grant that might be to make a point). Discretizing a continuous variable can be a good way to find and characterize nonlinear relationships, for instance (although perhaps a better way to go about this process would be with a CART or a random forest). – whuber Jun 10 '13 at 16:19
  • @all : Thank's for your comments, and to sum up I got that you don't recommand to do it, but now what if you had to do it, what would be the best way ? (or the least worst) Regards – TheBridge Jun 10 '13 at 19:17
  • Many further references on the problems with this approach are [here](http://stats.stackexchange.com/questions/41227/justification-for-low-high-or-tertiary-splits-in-anova/41233#41233) - in that case for regression, but the issues are relvant. As for 'best' - best for optimizing what? Why do you want to do it, and what are you trying to get out of it? – Glen_b Jun 11 '13 at 01:46
  • @Glen_b : Thank's again for all your references, to the question "why do I need to do it" I'll have to answer "because I have to". I know that sounds stupid but this really the very reason for me to do that. The fact that it is not a recommended thing to do is another matter. Anyway I understand your point that by doing this I will inevitably lose the useful information in the model, so capatilising on this remark; my aim now is doing so with the least damage. At last defining the meaning the "best" or "least worst" way is part of my question if not the most important part. Regards. – TheBridge Jun 11 '13 at 07:13
  • I can't define your criteria for you. "Least damage" to what end? – Glen_b Jun 11 '13 at 07:59
  • Ok let's try this minimizing the log-likelihood difference of the "continuous covariate model" and the "discretized one" (such a criteria is often used in variable selection I think) Wouldn't this mininmize the damages (in terms of likelihood) caused by discretizing the continuous covariate ? Regards – TheBridge Jun 11 '13 at 11:11

0 Answers0