Clustering with booleans and continuous data; Gower's coefficient + PAM?

Question

I have a medical dataset with both boolean variables and continuous variables (e.g. age/BMI). I know that clustering with K-means won't work due to the mixed datatypes. I read that I can use the Gower's coefficient to transform the data into a distance matrix, and feed this matrix to a clustering algorithm that can handle those such as PAM (partitioning around medoids). I have a few questions:

Should I use Gower's coefficient or is there a better alternative? My data consists of 2 continuous features (age, BMI), one categorical for gender (M/F) and several categorical boolean features.
I read that K-prototypes is also suitable for mixed datatype clustering. Would this clustering algorithm be preferred? And does that mean that I don't have to use Gower's coefficient, and simply feed the data as it is to K-prototypes?

Thanks for any information in advance.

I can't advice on K-prototypes. As for hierarchical clustering or PAM, yes, Gower coefficient is a way to go. You have a mixture of scale, nominal and binary features. I could remark, however, that doing clustering on a mixed data is not an excellent idea in general. When you have all variables of the same type you have much more options and flexibility in (1) choosing a distance measure, (2) in reasonably weighting the variables (if necessary) (3) in selecting a most appropriate standardization (if necessary). — ttnphns, Feb 23 '21 at 19:44
Some local info on Gower: https://stats.stackexchange.com/a/15313/3277 — ttnphns, Feb 23 '21 at 19:45
Hi @ttnphns, thanks for your reply. I guess PAM + gower is the way to go then. I know that typically clustering on a mixed data set is not a good idea. However, I'm trying to replicate another research its results in which they found several sub-phenotypes (clusters) within their data set. — Sandertjuhh, Feb 23 '21 at 19:49
By the way @ttnphns. Does using PAM + Gower mean that I don't have to do any normalizing/scaling with my variables? — Sandertjuhh, Feb 23 '21 at 19:53

Clustering with booleans and continuous data; Gower's coefficient + PAM?

0 Answers0