Clustering, reducing number of levels of categorical variable

Asked Apr 25 '16 at 15:39

Active May 16 '17 at 23:42

Viewed 83 times

I'm dealing with this big dataset which has:

1 categorical variable with 90 levels that represent some sort of "geographical area"
3 continuous variables

What I'm trying to do is to "aggregate" the levels of the categorical variable, to a maximum of 10.

I don't know exactly which technique I should use to accomplish this, for example whether I should use factor analysis or some clustering (unsupervised)

edited May 16 '17 at 23:42

kjetil b halvorsen

63,378
26
142
467

asked Apr 25 '16 at 15:39

mariob6

2

How about creating groups based on geographical regions, such as Northeast, Southwest, etc., that are more meaningful from the business perspective? – Vishal Apr 25 '16 at 17:58
Yes, but I also want to see if there is some analogy between areas in different regions, in order to investigate why! – mariob6 Apr 25 '16 at 19:00
What do you mean by "clustering (unsupervised)"? – Marquis de Carabas Apr 26 '16 at 21:50
Have a look at: https://stats.stackexchange.com/questions/227125/preprocess-categorical-variables-with-many-values/277302#277302 – kjetil b halvorsen May 16 '17 at 23:43

Clustering, reducing number of levels of categorical variable

0 Answers0