Questions tagged [multidimensional-scaling]

Technique that renders observed or computed (dis)similarities among objects into distances in a low-dimensional space (usually Euclidean). It thus constructs dimensions for the data; the objects can be plotted and conceptualized in those dimensions

The goal of multidimensional scaling (MDS) is, given pairwise dissimilarities (i.e. a matrix of distances $D = (d_{ij})$), find coordinates $x_i, x_j$ such that:

$$d_{ij} \approx ||x_i - x_j||_2$$

That is, such that distances are preserved.

Distance or dissimilarity is defined for any pair of objects. A distance is a metric in the mathematical sense and satisfies certain properties.

202 questions
159
votes
5 answers

What's the difference between principal component analysis and multidimensional scaling?

How are PCA and classical MDS different? How about MDS versus nonmetric MDS? Is there a time when you would prefer one over the other? How do the interpretations differ?
Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
26
votes
1 answer

t-SNE versus MDS

Been reading some questions about t-SNE (t-Distributed Stochastic Neighbor Embedding) lately, and also visited some questions about MDS (Multidimensional Scaling). They are often used analogously, so it seemed like a good idea make this question…
21
votes
5 answers

Are there any versions of t-SNE for streaming data?

My understanding of t-SNE and the Barnes-Hut approximation is that all data points are required so that all force interactions can be calculated at the same time and each point can be adjusted in the 2d (or lower dimensional) map. Are there any…
19
votes
3 answers

What is the role of MDS in modern statistics?

I recently came across multidimensional scaling. I am trying to understand this tool better and its role in modern statistics. So here are a few guiding questions: Which questions does it answer? Which researchers are often interested in using…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
17
votes
4 answers

Performing PCA with only a distance matrix

I want to cluster a massive dataset for which I have only the pairwise distances. I implemented a k-medoids algorithm, but it's taking too long to run so I would like to start by reducing the dimension of my problem by applying PCA. However, the…
bigTree
  • 739
  • 1
  • 9
  • 21
15
votes
1 answer

RandomForest - MDS plot interpretation

I used randomForest to classify 6 animal behaviours (eg. Standing, Walking, Swimming etc.) based on 8 variables (different body postures and movement). The MDSplot in the randomForest package gives me this output and I have problems in interpreting…
Pat
  • 351
  • 1
  • 2
  • 6
11
votes
3 answers

How to project high dimensional space into a two-dimensional plane?

I have a set of data points in a N-dimensional space. In addition, I also have a centroid in this same N-dimensional space. Are there any approaches that can allow me to project these data points into a two-dimensional space while keeping their…
bit-question
  • 2,637
  • 6
  • 25
  • 26
11
votes
2 answers

Visualizing multi-dimensional data (LSI) in 2D

I'm using latent semantic indexing to find similarities between documents (thanks, JMS!) After dimension reduction, I've tried k-means clustering to group the documents into clusters, which works very well. But I'd like to go a bit further, and…
Jeff
  • 3,525
  • 5
  • 27
  • 38
9
votes
3 answers

If a data set appears to be normal after some transformation is applied, is it really normal?

Suppose you have a data set that doesn't appear to be normal when its distribution is first plotted (e.g., it's qqplot is curved). If after some kind of transformation is applied (e.g., log, square root, etc.) it seems to follow normality (e.g.,…
9
votes
2 answers

Project new point into MDS space

I am trying to project a new point A(x, y, z) into a predefined MDS space in R. This is what I have so far: set.seed(1) x <- matrix(rnorm(3*10), ncol = 3) DM <- dist(x) MDS <- cmdscale(DM) # New data point to be projected A <- c(1, 2, 3) I am not…
mat
  • 639
  • 1
  • 4
  • 19
9
votes
3 answers

MDS on large dataset (R or Python)

I have a large 400000 $\times$ 400000 dataset (dissimilarity matrix) and I want to do multi-dimensional scaling on it. However, after looking at the generic cmdscale() function in R, it only takes maximum 46340 $\times$ 46340 matrix as input. Is…
Percy
  • 93
  • 1
  • 5
9
votes
2 answers

Scalable dimension reduction

Considering the number of features constant, Barnes-Hut t-SNE has a complexity of $O(n\log n)$, random projections and PCA have a complexity of $O(n)$ making them "affordable" for very large data sets. On the other hand, methods relying on…
8
votes
1 answer

Scaling/Normalization not need for tree based models

I could not find a good answer/reference that can explain why rf/decision trees/gbm are not susceptible to the scale of values of numerical variables. My sense is that since boosting methods penalize more if the error is large so they should…
8
votes
2 answers

How to calculate the R-squared value and assess the model fit in multidimensional scaling?

I would like to do Multidimensional Scaling (MDS) using cmdscale() in R. I have read that it is useful to try out how many dimensions are suitable for the data by trying different values of k, and then seeing what proportion of variance is accounted…
user32840
  • 111
  • 3
  • 6
7
votes
1 answer

MDS and PCA eigenvalues and eigenvectors

I understand that Multidimensional scaling (MDS) is same as doing Principal Components analysis (PCA) if Euclidean distance is used, this is known as Metric MDS. But I came across this in a book that "it has been shown (Chatfield and Collins 1980)…
user76170
  • 639
  • 2
  • 8
  • 9
1
2 3
13 14