Derivation of Procrustes transformation

Question

I'm curious about the derivation for the Procrustes transformation. I'm following ESLII: see figure 14.25 and problem 14.8 (Procrustes distance with scaling).

Given matrices $\mathbf{X}_1$ and $\mathbf{X}_2$, both $n$-by-$p$, the problem is \begin{equation*} \min_{\beta, \mathbf{R}} \, \lVert \mathbf{X}_2 - \beta\, \mathbf{X}_1 \mathbf{R} \rVert^2_F \end{equation*} where $\mathbf{R}$ is an orthogonal matrix, i.e. $\mathbf{R}^\intercal \, \mathbf{R} = \mathbf{I}$.

The solution is: \begin{equation*} \hat{\mathbf{R}} = \mathbf{U} \, \mathbf{V}^\intercal \end{equation*} \begin{equation*} \hat{\beta} = \frac{\mathrm{trace}\left(\mathbf{D}\right)}{\lVert \mathbf{X}_1\rVert^2_F} \end{equation*} where $\mathbf{X}_1^\intercal \mathbf{X}_2 = \mathbf{U} \mathbf{D} \mathbf{V}^\intercal$ is a singular value decomposition (i.e. $\mathbf{D}$ is diagonal, and $\mathbf{V}$ and $\mathbf{U}$ are orthonormal).

How does one derive these solutions? I tried differentiating the objective function in the minimization problem with respect to $\mathbf{R}$ and got \begin{equation*} \mathbf{X}_2 \mathbf{X}_1^\intercal = \beta \, \mathbf{X}_1 \,\mathbf{R}\, \mathbf{X}_1^\intercal. \end{equation*} Is that first order condition correct? If so, how does one get from there to the solution?

amoeba · Accepted Answer · 2018-08-13T15:46:16.590

Start with solving the Procrustes problem without scaling, i.e. without $\beta$.

We have $$\newcommand{\Y}{\mathbf Y} \newcommand{\X}{\mathbf X} \newcommand{\R}{\mathbf R} \newcommand{\U}{\mathbf U} \newcommand{\D}{\mathbf D} \newcommand{\V}{\mathbf V} \newcommand{\P}{\mathbf P} \DeclareMathOperator{trace}{tr} \|\Y-\X\R\|^2 = \|\Y\|^2+\|\X\R\|^2-2\trace(\Y^\top\X\R),$$ where the first two terms are constant. So the problem reduces to minimizing $$\trace(\Y^\top\X\R)=\trace(\U\D\V^\top\R)=\trace(\V^\top\R\U\cdot\D),$$ where I introduced the same SVD as in your question. The above operations are very standard, but now comes the crucial step: inside the trace we have a product of three orthogonal matrices which is itself an orthogonal matrix (denote it as $\P$), times diagonal $\D$. The trace is then equal to $$\sum P_{ii} D_{i}\le\sum D_{i}$$ because diagonal elements of an orthogonal $\P$ can not be larger than 1. To maximize this expression one should take $\P$ equal to the identity matrix. This corresponds to $\hat\R=\V\U^\top$.

To solve the Procrustes problem with scaling, rewrite the Frobenius norm as I did above and note that the optimal $\R$ does not depend on $\beta$. So one can solve for $\R$ first, as above. After plugging in $\hat\R$, one immediately obtains $\hat\beta$.

Derivation of Procrustes transformation

1 Answers1