On what basis can we combine levels in a factor variable when the target variable is binary?

Asked Dec 08 '17 at 16:17

Active Dec 08 '17 at 17:30

Viewed 743 times

I am working on a dataset in which a variable has following levels

Levels:      0   1  2  3  4 5 8 
Frequency: 608 209 28 16 18 5 7

The target variable is binary. To combine levels in a factor variable while the target variable is continuous, I learned that levels which have an approximately equal mean of the target variable should be combined. This can be found by plotting the boxplot (factor variable vs target variable).

But on what basis can one combine levels of the factor variable when the target variable is binary (i.e., in a classification problem)?

edited Dec 08 '17 at 17:27

gung - Reinstate Monica

132,789
81
357
650

asked Dec 08 '17 at 16:17

Abiram

2

What does "clubbing" mean here? – gung - Reinstate Monica Dec 08 '17 at 16:20
Clubbing means combining – Abiram Dec 08 '17 at 16:31
Let me elaborate. Clubbing means combining 2 or more levels in the variable into a single level. – Abiram Dec 08 '17 at 16:51
What is your motivation for doing this? – whuber Dec 08 '17 at 17:25
In addition to the two linked threads, there are probably others. [This search](https://stats.stackexchange.com/search?tab=votes&q=[categorical-data]%20combine%20levels%20is%3aquestion) should be a decent start. – gung - Reinstate Monica Dec 08 '17 at 17:31
Hello whuber, by combing the classes I think this variable could be made more effective for modeling purpose. I also read in some articles it is not a good idea to have many levels in a variable. This is the motivation. – Abiram Dec 08 '17 at 18:11

On what basis can we combine levels in a factor variable when the target variable is binary?

0 Answers0