Silhouette coefficients with random data

Question

I have point pattern data for the distribution of animals in fields. I'm hoping that the community can confirm (or soundly dismiss!) my newbie understanding of the silhouette method. Would it be the case that if the animals were randomly distributed in the fields, that the silhouette coefficient would hover around zero? And no matter what k was? Many thanks in advance.

I assume you mean Kaufman & Rousseeuw's Average Silhouette Width (ASW). This is normally computed from a distance and a clustering, usually generated by a clustering method. You haven't specified any of these, but ASW values will depend on them. There's no answer to your question possible without knowing these. (Actually chances are there is no perfect answer even if you specify them either, but I can say at least something then.) Also don't assume that people know what you mean if you use undefined notation ("k"). — Christian Hennig, Nov 14 '21 at 18:04
This question is problematic because to compute Silhouette criterion, you must do the clustering. Cluster results depend much on the method, the number of extracted clusters k, the dimensionality, and, also, it will capitalize on some _random_ cluster structure always observed in random data. Why not you do many such simulation probes yourself? I tentatively expect that on the average Silhouette will be a little greater than 0 (I didn't check it). — ttnphns, Nov 14 '21 at 19:04
[This](https://stats.stackexchange.com/q/222675/3277), albeit not an answer, could be of some interest for you — ttnphns, Nov 14 '21 at 19:05
@ttnphns I have seen ASW values *below* 0 in situations with not too small $k$. If the number of clusters is large, the *closest* cluster to an observation may be quite close. Although maybe this requires dimensionality larger than 2 for random data. — Christian Hennig, Nov 15 '21 at 00:45
Thank you very much, everybody. I'm sorry to be unclear. My head was so deep in the details that I couldn't see what I left out! I mean the coefficient called s(i) as described in Rousseeuw 1987 (JCAM 20:53-65). We are implementing it in R package "factoextra." We use k means clustering. Our purpose is to get a silhouette plot with the scores for different numbers of clusters (in this context k = number of clusters). — sboyd, Nov 15 '21 at 18:37
Our actual data has clear clusters that we have verified with K functions. However, a reviewer has asked if the silhouette method will "find" clusters even with random data. We have run this multiple times on simulated random data but get values around 0.4. As in Rousseeuw, I think with random data no clustering combination will be more natural than any other so s(i) should hover around zero. The scores of 0.4 are therefore giving me pause and reason to doubt my understanding. If it matters, we are using patterns with 25 points as it matches our field data sample sizes. Many thanks. — sboyd, Nov 15 '21 at 18:38

Silhouette coefficients with random data

0 Answers0