1

Just got stuck at working with K-means clustering.

I have looked up this python/skimage commands:

image_array = image.reshape([-1,3]).astype(np.float32)
kmeans = KMeans(n_clusters=2, random_state=0).fit(image_array)
labels_array = kmeans.labels_
labels = labels_array.reshape([image.shape[0], image.shape[1]])

when I noticed that the RGB images hast to be converted to one long array. How can K-means clustering know about the 2 spatial dimensions (and the 3rd one - color) when I pass an array?

Or is just my assumption wrong that spatial information is needed? At least the goal is to minimize the within-cluster sum of squares. The distance in x, y and color direction is therefor important, isn't it?

Jan Kukacka
  • 10,121
  • 1
  • 36
  • 62
Mr.Sh4nnon
  • 581
  • 3
  • 14

1 Answers1

2

The code sample you show actually only clusters pixels according to the color informationβ€”the spatial information is not being used. You could add it to the model by appending the actual coordinates to the color dimension:

coords_x, coords_y = np.meshgrid(range(img.shape[0]), range(img.shape[1]))
image_array = image.reshape([-1,3]).astype(np.float32)
image_array = np.concatenate([image_array,
                              coords_x.reshape(-1,1),
                              coords_y.reshape(-1,1)], axis=-1)    
kmeans = KMeans(n_clusters=2, random_state=0).fit(image_array)
labels_array = kmeans.labels_
labels = labels_array.reshape([image.shape[0], image.shape[1]])

However, then you run into trouble of how to weight/scale the individual dimensions, because the distance in RGB space and distance in pixel coordinates are not intuitively comparable. See for example Standardizing some features in K-Means for details on that problem.

Jan Kukacka
  • 10,121
  • 1
  • 36
  • 62
  • Thank you for the answer. Did this code actually run? I get one error after the other when using an RGB image. I see your point about the color information. What I don't get is how R,G,B channels can be distinguished. It's basically a vector with numbers 0-255. And then? Making n groups regarding to distance like 0-127 and 128-255 (n=2)? This wouldn't group colors but intensities. My assumption is the function reconstructs the RGB channels by dividing the vector into 3 parts. Is this correct? – Mr.Sh4nnon Apr 04 '19 at 07:57
  • 1
    The code should run now (I fixed two mistakes I made in a hurry). The RGB space is a 3D space like any other, each of the colors is represented by number between 0 and 255, so it's three vectors, not one. – Jan Kukacka Apr 04 '19 at 08:09
  • Ou now I got it. It's a lack of my python array understanding. I assumed image_array = image.reshape([-1,3]).astype(np.float32) makes it a 1D array of all channels. The code works now. However, as you stated the result is very questionable. Thanks alot – Mr.Sh4nnon Apr 04 '19 at 08:23