Should dummy variables be normalized along with numeric variables when doing kmeans clustering

Asked Nov 03 '16 at 05:42

Active Dec 30 '19 at 20:33

Viewed 1,065 times

I am trying to cluster the data set 'How Americans spend their time' using kmeans clustering.

The data set contains education, gender and age-range (55-60, 60-65 etc) as categorical variables and rest of the variables such as no-of-hours in socializing & relaxing, no-of-hours shopping, no-of-hours watching TV etc are all integers.

I have converted categorical variables to dummy variables. Next step is scaling (scaling and centering). Should I center and scale dummy variables also along with numeric variables.

I get very different clusters when I center and scale dummy variables (along with numeric variables) than when I center and scale numeric variables only. Which approach should I rely on? My feeling is I should also center and scale dummy variables along with numeric variables.

edited Nov 10 '16 at 09:34

kjetil b halvorsen

63,378
26
142
467

asked Nov 03 '16 at 05:42

user3282777

2

K-means is not recommended with binary data. And for dummy variables - simply inappropriate. http://stats.stackexchange.com/q/174556/3277; http://stats.stackexchange.com/a/81549/3277; http://stats.stackexchange.com/q/148417/3277. – ttnphns Nov 10 '16 at 10:10
1

See https://stats.stackexchange.com/questions/140711 for how standardization changes clusters. – whuber Dec 30 '19 at 22:10

Should dummy variables be normalized along with numeric variables when doing kmeans clustering

0 Answers0