How do we perform cross cluster analysis in R. Most of my data has categorical variables {variable Marital Status (married, single, divorced); variable Education (tertiary, secondary, primary, etc.); and the variables Credit, Mortgage, and Loan (yes, no)}? The only interval variable in my data set is the Age.
Asked
Active
Viewed 4,496 times
1
-
Questions that are only about how to use software are generally off topic here. If you have a machine learning question about clustering categorical data, please edit to clarify. – gung - Reinstate Monica Sep 27 '17 at 14:33
-
2I think this person is very early in learning. I don't think they are asking R only. I think they are asking "how do I think about this". It is exceptionally broad, but I think they are asking for "starting points" to enter the subject. I don't think they know the questions they could ask. – EngrStudent Sep 27 '17 at 14:38
1 Answers
4
Possible duplicate here, here and here. Distance-based clustering algorithms can handle categorical data. So you can implement clustering from a dissimilarity matrix.
First, you have to compute all the pairwise dissimilarities (distances) between observations in the data set (with daisy()
).
Then, you can run your clustering algorithm (with agnes()
, CrossClustering()
,...).
Here is an example.
library(cluster)
data(flower)
str(flower)
flower <- flower[, 1:6] # just to keep only categorial variables
# Dissimilarity matrix matrix
distm <- daisy(flower, metric = "gower", stand = FALSE)
distm
# Hierarchical agglomerative clustering (HAC)
hac <- agnes(distm, diss = TRUE)
hac$order
plot(hac)
# Partial clustering algorithm with automatic estimation of the number of
# clusters and identification of outliers
library(CrossClustering)
cross.clust <- CrossClustering(distm, k.w.min = 2, k.w.max = 5,
k.c.max = 6, out = TRUE)
cross.clust$Cluster.list

nghauran
- 402
- 4
- 15
-
Thank you ANG. That was really helpful. And sorry if my question looks similar to others. – Lav Sep 28 '17 at 01:21
-
`Error in CrossClustering(distm, k.w.min = 2, k.w.max = 5, k.c.max = 6, : could not find function "CrossClustering"` I tried to correct last two lines. Is this correct code? `cross.clust – vasili111 Oct 18 '19 at 23:46