0

I have a medical dataset with both boolean variables and continuous variables (e.g. age/BMI). I know that clustering with K-means won't work due to the mixed datatypes. I read that I can use the Gower's coefficient to transform the data into a distance matrix, and feed this matrix to a clustering algorithm that can handle those such as PAM (partitioning around medoids). I have a few questions:

  1. Should I use Gower's coefficient or is there a better alternative? My data consists of 2 continuous features (age, BMI), one categorical for gender (M/F) and several categorical boolean features.
  2. I read that K-prototypes is also suitable for mixed datatype clustering. Would this clustering algorithm be preferred? And does that mean that I don't have to use Gower's coefficient, and simply feed the data as it is to K-prototypes?

Thanks for any information in advance.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Sandertjuhh
  • 133
  • 5
  • 1
    I can't advice on K-prototypes. As for hierarchical clustering or PAM, yes, Gower coefficient is a way to go. You have a mixture of scale, nominal and binary features. I could remark, however, that doing clustering on a mixed data is not an excellent idea in general. When you have all variables of the same type you have much more options and flexibility in (1) choosing a distance measure, (2) in reasonably weighting the variables (if necessary) (3) in selecting a most appropriate standardization (if necessary). – ttnphns Feb 23 '21 at 19:44
  • 1
    Some local info on Gower: https://stats.stackexchange.com/a/15313/3277 – ttnphns Feb 23 '21 at 19:45
  • Hi @ttnphns, thanks for your reply. I guess PAM + gower is the way to go then. I know that typically clustering on a mixed data set is not a good idea. However, I'm trying to replicate another research its results in which they found several sub-phenotypes (clusters) within their data set. – Sandertjuhh Feb 23 '21 at 19:49
  • By the way @ttnphns. Does using PAM + Gower mean that I don't have to do any normalizing/scaling with my variables? – Sandertjuhh Feb 23 '21 at 19:53
  • 1
    Gower has its particular way to normalize. Please read. – ttnphns Feb 23 '21 at 20:13

0 Answers0