1

I am trying to understand the concept of Gower's (dis)similarity measure and I have problems to understand the scaling method for numeric variables.

Are numeric values just scaled between 0 and 1 with the following formula or how is scaling performed?

$$x'={\frac {x-{\text{min}}(x)}{{\text{max}}(x)-{\text{min}}(x)}}$$

Here are some informations that I found in the documentary of the R package StatMatch, but I am not sure if I get it right: http://finzi.psych.upenn.edu/library/StatMatch/html/gower.dist.html

the range of a numeric variable is estimated by jointly considering the values for the variable in data.x and those in data.y. Therefore, assuming rngs=NULL, if a variable "X1" is considered:

rngs["X1"] <- max(data.x[,"X1"], data.y[,"X1"]) - min(data.x[,"X1"], data.y[,"X1"])

In the documentation of the R package cluster I also couldn't find a clear answer: https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Joachim Schork
  • 1,068
  • 4
  • 15
  • 37
  • 1
    Original version of Gower similarity is scaled, for scale variables, by their range (http://stats.stackexchange.com/a/15313/3277). Manhattan distance is computed first, then divided by range, then converted into similarity by substracting from one. However, various other variants exist. – ttnphns Oct 06 '16 at 14:29
  • Thanks for your answer, I got it now. Also thanks for all the responses in other threads about the topic, they helped a lot! – Joachim Schork Oct 06 '16 at 14:59

0 Answers0