2

I am trying to implement k-means clustering on a 6x6 data set that looks like this:

2 3 6 0 1 7
4 9 9 6 2 2
0 1 7 9 5 0
2 3 2 7 8 3
8 2 9 2 3 1
8 0 0 1 7 9

Using rows 2 and 4 as the centroids:

4 9 9 6 2 2
2 3 2 7 8 3

Taking the firsts row of the dataset and the first centroid, I can calculate the Euclidean distance like so:

$ \sqrt ((4-2)^2 + (9-3)^2 + (9-6)^2 + (6-0)^2 + (2-1)^2 + (2-7)^2)$

I want to now run the algorithm using Modified Correlation instead of Euclidean Distance, defined as

$mc = 1 - r$, where $r$ is the Pearson Correlation Coefficient.

So how does this work? I have never really worked with covariance / standard deviation in more than 2-d space. Can somebody give me a quick runthrough on the first row like I did above (or point me in the right direction)? I can't seem to find documentation on how I can calculate this for a 6-d data set.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
R.S
  • 21
  • 2
  • 2
    Possible duplicate of [Why does k-means clustering algorithm use only Euclidean distance metric?](http://stats.stackexchange.com/questions/81481/why-does-k-means-clustering-algorithm-use-only-euclidean-distance-metric) – gung - Reinstate Monica Nov 19 '16 at 18:45

0 Answers0