1

I have data containing few categorical columns with a huge amount of categories at each (more than 1000 different categories at each column). I have to build a predictive model on this data, using the Logistic Regression method (I cannot use any model that can handle categorical data as is - Random Forest, Naïve Bayes, etc.).

Applying the standard 1-to-N method, to change the categorical values to 0-1 vectors, generates a really huge dimension and causes the algorithm to work very slowly (so I cannot apply this categorical data handling method).

Does anybody know any method how to transform categorical data with a large amount of categories, so that distance based methods will be able to handle this data properly?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Alex
  • 11
  • 2

0 Answers0