0

I am trying to do clustering on my data which consists of both categorical and continuous variables. I have some questions which I would like to ask:

  1. I am going to use the Gower Distance measure to find the similarties/dissimilarties between data points is that ok?

  2. Can I use K-Means clustering for mixed variables to perform clustering? If not I will use Two-Step Clustering but can Two-Step Clustering be performed in R? Also, if so which Hierarchical Algorithm will I have to use?

Thanks

ttnphns
  • 51,648
  • 40
  • 253
  • 462
  • 1
    Please search "K-means Gower" on the site. The question was asked several times already – ttnphns Mar 01 '19 at 09:39
  • @ttnphns - I have researched even before questioning here but there is not like a definite answer to my question as some of the answers are saying you can use K-Means some not and I don't know what I should actually do – Annalise Azzopardi Mar 01 '19 at 10:39
  • Please see my latest addition to here https://stats.stackexchange.com/a/15313/3277 – ttnphns Mar 01 '19 at 12:02
  • Two-step clustering (found in SPSS) acceps interval or nominal variables, but not ordinal or binary. https://stats.stackexchange.com/a/116859/3277. For nominal ones, it uses log-likelihood distance. – ttnphns Mar 01 '19 at 12:08
  • @ttnphns - I know about the Two Step Clustering but I am using R and it seems that there is no package that can be applied – Annalise Azzopardi Mar 01 '19 at 12:49

1 Answers1

0

K-means can only be used in data sets where you can compute the arithmetic mean.

Use hierarchical clustering instead. It can use distance matrixes, including Gower distances.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96