0

I have medical data with max value 500 along with values like age and binary values for sex (0 or 1). I will use clustering to find the number of clusters. Which is the best approach among three.Normalize each coloumn,standardize with z scores or do nothing. For z-scores i heard that he data should be a normal distribution,so should i box- cox them first? Any help is appreciated

2 Answers2

0

You should either do Normalization or standardization. You cannot do clustering properly with mixed attributes (continuous and binary) without the features scaling. You can check the following link as well: Is it important to scale data before clustering?

Harshit Mehta
  • 1,133
  • 12
  • 15
0

If you use Z-scores you should first be sure there is no outside standard population for converting them to z-scores. For example, children's height and weight are sometimes converted to z-scores, but researchers use the World Health Organization (WHO) dataset to do this, so the computed Z-score result is where each subject in the researcher's datatset would fall on the WHO distribution, if they were a member of that distribution.

Jeremy
  • 80
  • 4