2

I have a dataset containing non-normally distributed variables that I want to feed into a distance-based classifier (e.g. K-means). Is it ok to just subtract the mean and divide by the standard deviation to "standardize" it? Or should some other transformation be performed? Is it even necessary to standardize the data before feeding it into such a classifier?

This is similar to this question, but the other question doesn't cover the k-means angle.

Edit: I would like to standardize the data to prevent one of my covariates from dominating the distance calculation. Some of my covariates are bounded between 0..200 and others between 0..2.

DLaw
  • 155
  • 1
  • 7

0 Answers0