2

I have a matrix where each row corresponds to an observation with binary attributes, and I am interested in performing multidimensional scaling using cmdscale on this data. I am looking into binary distance measures but I am having some trouble on how to correctly define the distance matrix that is needed as input:

  1. From what I understand, a similarity measure needs to satisfy three properties (boundary conditions, symmetry, identity/indiscernability). If the pairwise similarity matrix is PSD then the similarity is also a metric. In the case of dissimilarities, they must satisfy non-negativity, symmetry, identity/indiscernability. If the dissimilarity meets the triangle inequality, it is also a distance measure (and a metric).

    • How can I transform (and under what conditions) a similarity measure into a distance measure? Would it be correct to do this with $d = 1 - s$ if the similarity measure is also a metric?
  2. I am interested in analyzing the symmetry / asymmetry properties of several binary similarity measures (i.e. see how the MDS output behaves with measures that take positive matches and negative matches into account; or only positive matches).

    • Are the symmetry / asymmetry properties of a similarity measure preserved if I convert them to dissimilarities (or distances)?
drgxfs
  • 804
  • 6
  • 17
  • The simplest thing you can do is use a proper distance to begin with, e.g. simply the Euclidean distance in the attribute space. In this case, metric MDS turns out to be the same as simply performing a PCA on the original data. The reason is that if your data are already living in a space, you don't have to use MDS to embed them into a space. The question is of course whether Euclidean distance makes sense with respect to your binary attributes. – A. Donda Jul 22 '15 at 13:57
  • 2
    There exist many ways to convert a similarity into a distance. Actually, you may use _any_ way or invent your own if it will really make mathematical and substantial sense for you. – ttnphns Jul 22 '15 at 14:35
  • 1
    `If the pairwise similarity matrix is PSD then the similarity is also a metric` As far as I know word "metric" is reserved for distances (dissimilarities). If a specific similarity measure matrix is always PSD the similarity could be then called "euclidean" because it spreads euclidean space and can be [converted by](http://stats.stackexchange.com/a/36158/3277) the cosine law into the corresponding euclidean distance. As for metric distances - not every metric distance is euclidean distance. – ttnphns Jul 22 '15 at 14:39

0 Answers0