1

I am trying to perform clustering (planned to use K-means in R) on the data that contain both categorical and continuous variables. For example, my data contains 4 variables: gender (M and F), income (15000 - 70000 USD), employment period (in months), and education (Bachelor, Master, and PhD).

First, I recode the categorical variables to a set of flag, so gender will be represented by [1, 0] if male, and [0, 1] for female. This also applies to education. If bachelor, it will be [1,0,0].

Here is my question:

Since I have income variables ranged from 15000- 70000 USD and employment periods (in months), I should normalize these two variables.

What about the variables that I created (0-1 variables)? Do I have to normalize them ?

newbie
  • 375
  • 1
  • 2
  • 9

0 Answers0