1

I'm using Hierarchical Clustering Package in Mathematica to analyse the set of experimental data (each experimental point has about 10 parameters). There are a lot of options for Distance function and Linkage in this Package. I used EuclideanDistance and Median for Linkage (and I have satisfying results, that fits well with our expectations), but the question is how rationally choose Distance function and Linkage? And if using different Distance function and Linkage gives different clustering is it means that the clustering is not reliable for our dataset? Or if it works just for EuclideanDistance is fine? In which cases clustering is not depend on Distance function and Linkage? And what conclusion could be made if it does?

1 Answers1

1

I don't have a complete answer, but would love to see a definitive one.

I think for measures, you should pick one that is appropriate to your data (continuous, binary, or mixed). This is described well in the Stata manual here.

For linkage types, there's some good discussion on how to choose the linkage here in the Agglomerative Methods section.

dimitriy
  • 31,081
  • 5
  • 63
  • 138