0

I have only categorical variables in my database.

What distance/similarity to use?

I´m using the function simil() (library(proxy) in R.

1 Answers1

1

You could try converting your categorical variables into sets of dummy variables and then use the Jaccard index as the distance measure.

There is a more detailed explanation here: What is the optimal distance function for individuals when attributes are nominal?

Paul McGettigan
  • 301
  • 1
  • 2
  • 1
    Thank you for the link to an answer of mine. However, I may notice that the answer recommended Dice rather than Jaccard for categorical (and hence dummy) attributes. – ttnphns May 14 '15 at 14:56