Clustering sets of vectors

Question

I have a set of $d$-dimensional vectors $\{v_1,v_2,\dots,v_n\}$, each of which has been assigned a label from a set $S=\{s_1,s_2,\dots,s_k\}$. I would like to find another set of labels $T=\{t_1,t_2,\dots,t_l\}$ where $l < k$, such that all vectors having the same $S$ label also have the same $T$ label. In other words $T$ is a strictly coarser clustering than $S$. My question is, what is a good way to go about finding this $T$ clustering?

The obvious approach would be to take the mean of all of the vectors having a given S label, and then cluster these new $s$ vectors. However I feel like this throws away a lot of potentially useful information about the distribution of the vectors that went into computing those means. Is there another method for finding this $T$ clustering which makes better use of the $v$ vectors? Thanks in advance.

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

Hierarchical agglomerative clustering might work for you. It typically starts with each data point in its own cluster, then iteratively merges pairs of clusters to form larger and larger clusters. Since you already have an initial clustering, you'd start from that instead of individual points. To determine your merging procedure, you'd need to decide on the distance metric and linkage criterion. The linkage criterion determines which pair of clusters to merge next. Many different criteria are discussed here.

edited Apr 13 '17 at 12:44

Community

1

answered Jun 26 '16 at 14:41

user20160

29,014
3
60
99

Hmm, that might be a good way to approach the problem. Where we can collapse the subsets of v vectors to s vectors, but then encode variance information through the linkage criterion of how the s vectors are combined with one another. – jon_simon Jun 29 '16 at 13:00

Clustering sets of vectors

1 Answers1