6

I am having trouble reconciling between several terms in MDS. According to [1], Section 14.8, Classical MDS takes similarities as inputs. In [2], also cited in Wikipedia, Classical MDS takes dissimilarities as inputs.

What is the agreed upon terminology?

[1] Hastie, T, R Tibshirani, and JH Friedman. The Elements of Statistical Learning. Springer, 2003.

[2] Borg, I., and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications. 2nd edition. New York: Springer, 2005.

JohnRos
  • 5,336
  • 26
  • 56
  • 2
    I suppose you may be mistaken. Term "Classic MDS" or "simplest MDS" is usually understood as unweighted euclidean MDS, that is, not INDSCAL model. Classic MDS works with one matrix. It can be similarities or dissimilarities. Usually a program first converts the first into the second and proceeds. Because the algorithm itself is typically written for dissimilarities. – ttnphns Apr 24 '15 at 14:30
  • Plugging similarities or dissimilarities in the squared stress function leads to different solutions (see discussion in [1] between Eq(14.100) and Eq.(14.101)). It is thus of importance to define which goes into the stress function. – JohnRos Apr 24 '15 at 14:33

1 Answers1

5

These two books are in full agreement.

Classical multidimensional scaling (where by "classical MDS" I understand Torgerson's MDS, following both Hastie et al. and Borg & Groenen) finds points $z_i$ such that their scalar products $\langle z_i, z_j \rangle$ approximate a given similarity matrix as well as possible. However, any dissimilarity matrix can be converted into a similarity matrix: dissimilarities are assumed to be Euclidean distances, from which centered scalar products can be computed and taken as similarities.

So the algorithm of classical/Torgerson MDS is as follows: $$\text{Euclidean distances}\to\text{Centered scalar products}\to\text{Optimal mapping},$$ i.e. $$\text{Dissimilarities}\to\text{Similarities}\to\text{Optimal mapping}.$$ What you consider an "input" here, does not really matter.

This is exactly what is written in Hastie et al.:

In classical scaling, we instead [as opposed to metric scaling in general] start with similarities [...]. This is attractive because there is an explicit solution in terms of eigenvectors [...]. If we have distances rather than inner-products, we can convert them to centered inner-products if the distances are Euclidean [...]. If the similarities are in fact centered inner-products, classical scaling is exactly equivalent to principal components [...]. Classical scaling is not equivalent to least squares scaling [that minimizes reconstruction of dissimilarities].

See my answer in What's the difference between principal components analysis and multidimensional scaling? for mathematical details.

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • Thank you. What you are saying is that Borg & Groenen definition is wrong: the stress is not a function of dissimilarities, but rather of similarities. Put differently, Borg & Groenen "classical MDS" is Hastie's Least Squares Scaling. Would you agree? – JohnRos Apr 24 '15 at 21:53
  • No, definitely not. Borg & Groenen seem to be in full agreement with Hastie et al., and use the term "classical scaling" (Chapter 12) to refer to Torgerson scaling, which is what Hastie et al. also do. – amoeba Apr 24 '15 at 22:04
  • @amoeba, I've just surfed over some texts on MDS and I found a confusion with the definition of term "Classical MDS". Some sources define it as you did, as =Torgerson's MDS aka PCoA. But more sources define it as "single matrix Euclidean model MDS" (I myself thought this way and continue to agree with that custom). If to admit to it then Classical MDS (aka Identity MDS) _is_ what is not Replicated MDS or Weighted/Generalized MDS (INDSCAL's versions). – ttnphns Apr 24 '15 at 23:04
  • (cont.) It - classical MDS - can be metric or nonmetric, be based on different stress measures, be algorithmically based on eigen or ALSCAL or PREFSCAL etc. Torgerson's PCoa is just one (and simplest) kind of it. Note that [in my answer](http://stats.stackexchange.com/a/14017/3277) I implied that definition of classical mds. – ttnphns Apr 24 '15 at 23:06
  • See also http://stats.stackexchange.com/a/31291/3277 where I also implied that definition of classical mds. – ttnphns Apr 24 '15 at 23:12
  • @ttnphns, Thanks, it is good to know that there is this terminological ambiguity. I was not aware of that. However, just to be clear: both books that OP is citing use the term "classical MDS" in the sense "Torgerson's MDS". I will explicitly mention this in my answer now, to remove potential confusion. – amoeba Apr 24 '15 at 23:16
  • Ah, ambiguities are everywhere... It might be nice to quit the term "classical mds" altogether. Identity (= single matrix unweighted) MDS. Torgerson's PCoA. Two nice terms. – ttnphns Apr 24 '15 at 23:20
  • @ttnphns: Agree. But I don't like the term "PCoA" and would rather avoid it; I am happy to use the term "Torgerson['s] MDS" though. – amoeba Apr 24 '15 at 23:22
  • I don't like the acronym much either, because it is, mathematically, just _double centering_ then PCA. But the term is relatively well established. – ttnphns Apr 24 '15 at 23:25
  • @John, perhaps you could point me to a specific bit in Borg & Groenen that you think is in disagreement with Hastie et al.? Then I could try to clarify it further. At the moment I am not sure where the confusion is. – amoeba Apr 25 '15 at 15:34
  • Eq.(12.4) in Borg & Groenen does indeed agree with Eq.(14.100) in Hastie. I guess my problem is just with the [Wikipedia](http://en.wikipedia.org/wiki/Multidimensional_scaling) statement: "... takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain" which implies that the loss if defined w.r.t dissimilarities, while it is actually not. – JohnRos Apr 25 '15 at 16:48
  • @JohnRos, But you can easily define the loss function ("strain") via dissimilarities! The scalar products must approximate the similarity matrix, yes, but the similarity matrix is given by a transformation of the dissimiliarity matrix (see my linked answer or either of these textbooks). So you can plug this transformation into the loss function and obtain a loss function explicitly defined via dissimilarity matrix! – amoeba Apr 25 '15 at 21:53