When doing sequence analysis using a package such as TraMineR
, one can calculate a clustering based on Optimal Matching (OM) distances, and then plot it as a tree. I use agnes to do it, roughly like this:
sequences.sts <- seqdef(sequences.sts)
ccost <- seqsubm(sequences.sts, method = "CONSTANT", cval = 2, with.missing=TRUE)
sequences.OM <- seqdist(sequences.sts, method = "OM", sm = ccost, with.missing=TRUE)
clusterward <- agnes(sequences.OM, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2)
This gives me a plot of the cluster diagram, and it also gives me an agglomerative coefficient. However, ?agnes.object
notes that the agglomerative coefficient (ac
) grows as the dataset grows, and therefore it is unsuitable as a way of comparing datasets of different size.
Is there any other way of comparing the overall "degree of clustering", or overall "degree of alignment" in a sequence dataset that allows us to reliably compare datasets of different sizes?