Difference between Ward hierarchical clustering and K-Means for classification

Question

I have a dataset where of socio-demographic features of a population (expressed as percentages over the total population of the municipality: e.g. 12% of freelancers, 5% of unemployed etc.), each observation is a municipality of the city. My goal is to politically classify each municipality in left/right (roughly). I compare both K-Means and hierarchical clustering using the Ward method, and I find that the latter performs way better, misclassifying only 2% of the points, while kmeans does a little worse, with a 6% of wrong points.

My question is: from a theoretical pov, how do I interpret this result? Why should one perform better than another in such a situation?

Both share the same objective function but the algorithm is very different. In majority of cases k-means, being iterative, will minimize the objective (SSW) somewhat better than Ward. On the other hand, Ward is more apt to "uncover" clusters not so round or not so similar diameter as k-means typically tends for. — ttnphns, Mar 11 '18 at 13:32
Clustering is not supervised classification. The notion of "miscassification" in clustering isn't a direct test, it is one of facets of external validation. https://stats.stackexchange.com/a/195481/3277 — ttnphns, Mar 11 '18 at 13:37
Thanks! I have to say that I already read that answer of yours, which is brilliant, and which is what actually made me wonder things. I surely am in case 4, where "how accurately your clustering method is able to uncover the real clusters is the measure of external validity". BUT: now that I know this, that the Ward clustering performs better on my data, is there a formal way to try explaining why it does? — sato, Mar 11 '18 at 15:11
Since it is data-dependent, and the results of Ward is even more data dependent than k-means, I'm not sure it is easy to explain theoretically. But try first to do k-means after Ward, using cluster centres of the Ward clusters as initial for k-means - to see if k-means can much improve the results (1) in terms of SSW, (2) in terms of extermal validation (i.e. misclassification). If it improves on (1) but worsen on (2) you may attribute the finding to that - as I've said initially - Ward is less demanding, more permissive in terms of cluster shape/size assumptions. — ttnphns, Mar 11 '18 at 15:26

Difference between Ward hierarchical clustering and K-Means for classification

0 Answers0