How can K-Means clustering work without spatial information?

Question

Just got stuck at working with K-means clustering.

I have looked up this python/skimage commands:

image_array = image.reshape([-1,3]).astype(np.float32)
kmeans = KMeans(n_clusters=2, random_state=0).fit(image_array)
labels_array = kmeans.labels_
labels = labels_array.reshape([image.shape[0], image.shape[1]])

when I noticed that the RGB images hast to be converted to one long array. How can K-means clustering know about the 2 spatial dimensions (and the 3rd one - color) when I pass an array?

Or is just my assumption wrong that spatial information is needed? At least the goal is to minimize the within-cluster sum of squares. The distance in x, y and color direction is therefor important, isn't it?

Jan Kukacka · Accepted Answer · 2019-04-04T08:03:18.520

2

The code sample you show actually only clusters pixels according to the color information—the spatial information is not being used. You could add it to the model by appending the actual coordinates to the color dimension:

coords_x, coords_y = np.meshgrid(range(img.shape[0]), range(img.shape[1]))
image_array = image.reshape([-1,3]).astype(np.float32)
image_array = np.concatenate([image_array,
                              coords_x.reshape(-1,1),
                              coords_y.reshape(-1,1)], axis=-1)    
kmeans = KMeans(n_clusters=2, random_state=0).fit(image_array)
labels_array = kmeans.labels_
labels = labels_array.reshape([image.shape[0], image.shape[1]])

However, then you run into trouble of how to weight/scale the individual dimensions, because the distance in RGB space and distance in pixel coordinates are not intuitively comparable. See for example Standardizing some features in K-Means for details on that problem.

edited Apr 04 '19 at 08:03

answered Apr 04 '19 at 07:45

Jan Kukacka

10,121
1
36
62

Thank you for the answer. Did this code actually run? I get one error after the other when using an RGB image. I see your point about the color information. What I don't get is how R,G,B channels can be distinguished. It's basically a vector with numbers 0-255. And then? Making n groups regarding to distance like 0-127 and 128-255 (n=2)? This wouldn't group colors but intensities. My assumption is the function reconstructs the RGB channels by dividing the vector into 3 parts. Is this correct? – Mr.Sh4nnon Apr 04 '19 at 07:57
1

The code should run now (I fixed two mistakes I made in a hurry). The RGB space is a 3D space like any other, each of the colors is represented by number between 0 and 255, so it's three vectors, not one. – Jan Kukacka Apr 04 '19 at 08:09
Ou now I got it. It's a lack of my python array understanding. I assumed image_array = image.reshape([-1,3]).astype(np.float32) makes it a 1D array of all channels. The code works now. However, as you stated the result is very questionable. Thanks alot – Mr.Sh4nnon Apr 04 '19 at 08:23

How can K-Means clustering work without spatial information?

1 Answers1