Normalize data for clustering

Asked Nov 10 '14 at 08:35

Active Nov 10 '14 at 08:45

Viewed 103 times

I am trying to perform clustering (planned to use K-means in R) on the data that contain both categorical and continuous variables. For example, my data contains 4 variables: gender (M and F), income (15000 - 70000 USD), employment period (in months), and education (Bachelor, Master, and PhD).

First, I recode the categorical variables to a set of flag, so gender will be represented by [1, 0] if male, and [0, 1] for female. This also applies to education. If bachelor, it will be [1,0,0].

Here is my question:

Since I have income variables ranged from 15000- 70000 USD and employment periods (in months), I should normalize these two variables.

What about the variables that I created (0-1 variables)? Do I have to normalize them ?

edited Nov 10 '14 at 08:45

asked Nov 10 '14 at 08:35

newbie

see [this](http://stats.stackexchange.com/a/121921/603) answer – user603 Nov 10 '14 at 10:44

Normalize data for clustering

0 Answers0