13

I have 17 numeric and 5 binary (0-1) variables, with 73 samples in my dataset. I need to run a cluster analysis. I know that the Gower distance is a good metric for datasets with mixed variables. However, I couldn't understand how the Gower distance calculates the difference between binary variables. It seems to me that it is not different from Euclidean distance.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Emrah Bilgiç
  • 289
  • 2
  • 7
  • 14
  • Your question isn't quite clear. Are you simply asking 'how does the Gower distance calculate the difference between binary variables'? What does "there is no difference than Euclidean" mean? – gung - Reinstate Monica Oct 21 '14 at 15:42
  • 1
    Thank you. Sorry, I ask how Gower calculate the difference between binary variables. I mean, I couldn't understand the differences btw. Euclidean and Gower for binary variable. – Emrah Bilgiç Oct 21 '14 at 15:58
  • Have you searched this site for `Gower`? http://stats.stackexchange.com/a/15313/3277 – ttnphns Oct 21 '14 at 16:48
  • Yes I did. Euclidean distance is 0, if both samples have same value, 1 if not. What about Gower? – Emrah Bilgiç Oct 21 '14 at 16:57
  • 4
    @EmrahBilgiç, Gower metric is similarity, _not_ distance. It becomes "distance" when is subtracted from 1. Read under the link above how it processes binary data. – ttnphns Oct 21 '14 at 17:31
  • I read the details of daisy function; "The contribution d(ij,k) of a nominal or binary variable to the total dissimilarity is 0 if both values are equal, 1 otherwise." This is not different from what Euclidean does for binary variable. – Emrah Bilgiç Oct 21 '14 at 20:13

2 Answers2

5

How about binary attributes that have the values "m" and "f", for "male" and "female"?

You do realize that for a dicotomous variable all you can get out is "same" or "different"? The key point difference between distances is not if the value is 1 or 0; but how multiple variables are combined.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
5

Gower distance uses Manhattan for calculating distance between continuous datapoints and Dice for calculating distance between categorical datapoints

Sanjeet
  • 51
  • 1
  • 1