Some of the features I am working with has more than 300 factor levels. I tried to reduce their number of levels with a 'fake dummy' method. For example, I replaced one 600-level predictor with 4 predictors (as $600<5^4$). Is this a valid approach?
Asked
Active
Viewed 80 times
1
-
Could you explain how these 4 predictors were coded? – chl Sep 24 '12 at 17:54
-
when use 0/1 coded factor,you need 10 (600<2^10) predictors,I set every predictor have no more than 5 levels. – Chenghao Liu Sep 24 '12 at 18:16
-
I'm not sure to follow, because it is my understanding that a $k$-level factor is represented as $k-1$ dummy variables in a design matrix. Could you give us a little more details? Or do you just mean that you merged several factor levels together so that in the end you got 10 dummies instead of 600 levels (which would require 599 dummies), for a single predictor? – chl Sep 24 '12 at 21:02
-
when transform to dummy variable,only one variable are 1 others are zero,i want to reduce the number of predictors,so more than one variables can be 1 – Chenghao Liu Sep 25 '12 at 03:49
-
See https://stats.stackexchange.com/questions/227125/preprocess-categorical-variables-with-many-values/277302#277302 for a better approach – kjetil b halvorsen May 17 '17 at 11:20