In the setup of classical MultiDimensional Scaling (MDS), assume that $D:=[d_{ij}]$ be an $n \times n$ distance matrix, i.e. $d(i,i)=0, d(i,j)=d(j,i) > 0 \forall i, j = 1 \dots n.$ Assume that: the original data $X:=[x_1 \dots x_n] \in \mathbb{R}^{d \times n}, x_i \in \mathbb{R}^{d \times 1}$ generating the distance matrix $D$ was Euclidean and of dimension $p$. So mathematically, assume: $||x_i - x_j||^2 = d_{ij}^2$.
Assume now that, we're performing classical MDS on the data to embed it into $p$ dimensions again. I'm aware that we won't know $p$, but the point here is not the knowledge of $p$: the point is that (somehow!) the embedding dimension is exactly equal to the original dimension of the data. That is, we're performing the following algorithm:
- Define the centering matrix $H_n:= I_n - \frac{1}{n}1_n1_n^{T}.$
- Perform the SVD of $-\frac{1}{2}H_nD^2H_n = (XH_n)^{T}(XH_n)$, where $D^2$ is the entrywise square of $D$ (so, not the square of $D$ as in matrix multiplication/squaring). Assume $-\frac{1}{2}H_nD^2H_n= U\Lambda U^{T}= \sum_{i=1}^{n} \lambda_i u_i u_i^{T} $ after SVD.
- From the diagonal matrix $\Lambda$ above that contains the eigenvalues $\{\lambda_1 \geq\dots \geq \lambda_n \geq 0\}$ of $-\frac{1}{2}H_nD^2H_n,$ extract the square roots of the $p$ largest eigenvalues, and the corresponding eigenvectors. Call the corresponding part of $\Lambda$ to be $\Lambda_p$, and that of $U^{T}$ to be $U_p^{T}$
- Finally, the $p$ -dimensional embedding is given by the $n$ columns of the $p \times n$ matrix $Y_p:=\sqrt\Lambda_p U_p^{T}$.
Assume now that the original, unknown data was centered (we don't have to assume this, but doing this anyway to get rid of the pesky $H_n$ appearing on both sides due to double centering, this'll simplify notations): i.e. assume that $XH_n = X$. It's easy to check that: the embedding $Y_p$ generated by MDS is also centered, i.e. $Y_p H_n = Y_p$, and that $Y_p^{T}Y_p = X^{T}X= \sum_{i=1}^{p} \lambda_i u_i u_i^{T}$, since the rank of $X^{T}X \le d$, so $\lambda_{d+1}= \dots \lambda_n=0$.
My questions are: (Please supply proofs or references to look at)
(1) Is $Y_p = R.X$ for some $R \in O(p)$ in this case?
(2) If not, is there a useful estimate of the distance between the original and MDS constructed data: $||X - Y_p||$, perhaps an upper and a lower bound? Thank you in advance!