Standardizing non-normal data for use in distance-based classifier

Asked Sep 26 '13 at 23:26

Active Sep 27 '13 at 00:52

Viewed 441 times

I have a dataset containing non-normally distributed variables that I want to feed into a distance-based classifier (e.g. K-means). Is it ok to just subtract the mean and divide by the standard deviation to "standardize" it? Or should some other transformation be performed? Is it even necessary to standardize the data before feeding it into such a classifier?

This is similar to this question, but the other question doesn't cover the k-means angle.

Edit: I would like to standardize the data to prevent one of my covariates from dominating the distance calculation. Some of my covariates are bounded between 0..200 and others between 0..2.

edited Apr 13 '17 at 12:44

Community

asked Sep 26 '13 at 23:26

DLaw

What does the standardization need to achieve? – Glen_b Sep 27 '13 at 00:05
I think you answered your own question. You should standardize the data if your inputs have different scales. – Akavall Sep 30 '13 at 02:37

Standardizing non-normal data for use in distance-based classifier

0 Answers0

Linked