I have medical data with max value 500 along with values like age and binary values for sex (0 or 1). I will use clustering to find the number of clusters. Which is the best approach among three.Normalize each coloumn,standardize with z scores or do nothing. For z-scores i heard that he data should be a normal distribution,so should i box- cox them first? Any help is appreciated
Asked
Active
Viewed 543 times
2 Answers
0
You should either do Normalization or standardization. You cannot do clustering properly with mixed attributes (continuous and binary) without the features scaling. You can check the following link as well: Is it important to scale data before clustering?

Harshit Mehta
- 1,133
- 12
- 15
-
and i have to standardize every coloumn separated ,right? – nikolaosmparoutis Aug 01 '17 at 21:56
-
Yes, that should be the way. – Harshit Mehta Aug 01 '17 at 22:05
0
If you use Z-scores you should first be sure there is no outside standard population for converting them to z-scores. For example, children's height and weight are sometimes converted to z-scores, but researchers use the World Health Organization (WHO) dataset to do this, so the computed Z-score result is where each subject in the researcher's datatset would fall on the WHO distribution, if they were a member of that distribution.

Jeremy
- 80
- 4