I have a Bernoulli response variable and I am going to fit a logistic regression. One of my independent variables is a continuous random variable and I would like to categorize it before fitting the logistic regression. While this will lose some information, it makes my predictions a lot easier and at the same time I can see the effect of this continuous random variable easily. I am trying to categorize it such that each category would be distinct in terms of their performance on estimated probabilities. Ideally I would like to see the logistic regression coefficients of this categorized variable to be statistically significant. By experience, I know that the number of categorizes should be less than 8 as well. Most of the time it is around 4 or 5 categories. But the exact number of categories is actually unknown. Finding good break points is challenging here. I have tried Recursive Partitioning and Regression Tree before. But to use this approach, I would first need to categorize the independent variables myself and then it provides me with the breakpoints.
I was wondering if there is any other alternative approach to categorize this continuous independent variable.
- Please note that this question is not asking on whether to categorize or not as I am aware of disadvantages and advantages of that. I hope those who want to answer or comment consider this before trying to convince me to not categorize it. Thank you.