I have a medical dataset with both boolean variables and continuous variables (e.g. age/BMI). I know that clustering with K-means won't work due to the mixed datatypes. I read that I can use the Gower's coefficient to transform the data into a distance matrix, and feed this matrix to a clustering algorithm that can handle those such as PAM (partitioning around medoids). I have a few questions:
- Should I use Gower's coefficient or is there a better alternative? My data consists of 2 continuous features (age, BMI), one categorical for gender (M/F) and several categorical boolean features.
- I read that K-prototypes is also suitable for mixed datatype clustering. Would this clustering algorithm be preferred? And does that mean that I don't have to use Gower's coefficient, and simply feed the data as it is to K-prototypes?
Thanks for any information in advance.