I have data containing few categorical columns with a huge amount of categories at each (more than 1000 different categories at each column). I have to build a predictive model on this data, using the Logistic Regression method (I cannot use any model that can handle categorical data as is - Random Forest, Naïve Bayes, etc.).
Applying the standard 1-to-N method, to change the categorical values to 0-1 vectors, generates a really huge dimension and causes the algorithm to work very slowly (so I cannot apply this categorical data handling method).
Does anybody know any method how to transform categorical data with a large amount of categories, so that distance based methods will be able to handle this data properly?