3

What is the derivative of $F = \|X^T-S^TAX^T\|_F^2$ w.r.t $A$, where $X \in\mathbb R^{d \times N}$, $S \in\mathbb R^{k \times N}$, and $A \in \mathbb{R}^{k \times N}$?

I have tried, and it is as follows:

$$\frac{\partial{F}}{\partial A} = -(X^T-S^TAX^T)SX$$

but I think the result is not correct, because the dimensions do not match. Any help greatly appreciated.

Elyor
  • 51
  • 4
  • You appear to be squaring the matrix $X^T - S^TAX^T$, but you shouldn't. Just take its derivative, note that for $F = ||B||_F^2$, $dF/db_{ij} = 2b_{ij}$, and the path to the result should be clear.. – jbowman Feb 21 '17 at 21:57
  • 2
    Because your function is very nearly the same as the one at http://stats.stackexchange.com/questions/257579 (assuming "$_F$" refers to the Frobenius norm), surely the methods described in my answer there will readily apply. – whuber Feb 21 '17 at 22:11

2 Answers2

3

From section 2.5.2 of The Matrix Cookbook:

$$ \frac{\partial}{\partial \mathrm X}\text{Tr}\left[(\mathrm A \mathrm X \mathrm B + \mathrm C)( \mathrm A \mathrm X \mathrm B + \mathrm C)^\top\right]=2 \mathrm A^\top( \mathrm A \mathrm X \mathrm B + \mathrm C) \mathrm B^\top $$

Hence,

$$\nabla_{\mathrm A} \| \mathrm X^{\top} - \mathrm S^{\top} \mathrm A \mathrm X^{\top} \|_{\rm F}^2 = \nabla_{\mathrm A} \| \mathrm S^{\top} \mathrm A \mathrm X^{\top} - \mathrm X^{\top} \|_{\rm F}^2 = 2 \, \mathrm S \left( \mathrm S^{\top} \mathrm A \mathrm X^{\top} - \mathrm X^{\top} \right) \mathrm X$$

2

We have

\begin{equation} \begin{split} \|X^T - S^TAX^T \| & = \text{Tr}(X^T - S^TAX^T)^T(X^T - S^TAX^T) \\ & = \text{Tr}(X - (S^TAX^T)^T)(X^T - S^TAX^T) \\ & = \text{Tr}(XX^T - XS^TAX^T -(S^TAX^T)^TX^T + (S^TAX^T)^T(S^TAX^T))\\ & = \text{Tr}(XX^T) - 2\text{tr}(XS^TAX^T) + \text{tr}(XA^TSS^TAX^T))\\ \end{split} \end{equation}

Then,

It is clear that

$$ \frac{\partial \text{Tr}(XX^T)}{\partial A} = 0. $$

For the second term we have : \begin{equation} \begin{split} \frac{\partial (2\text{tr}(XS^TAX^T))}{\partial A} & = \frac{\partial (2\text{Tr}(X^TXS^TA))}{\partial A} \\ & = 2(X^TXS^T)^T \\ & = 2SX^TX. \\ \end{split} \end{equation}

Here, we used formula 100 of the TheMatrixCookBook: $\frac{\partial \text{Tr}(AX)}{\partial X} = A^T$

For the last term we have (formula 116 of the TheMatrixCookBook): \begin{equation} \begin{split} \frac{\partial \text{Tr}(XA^TSS^TAX^T))}{\partial A} & = SS^TAX^TX + SS^TAX^TX \\ & = 2SS^TAX^TX \\ \end{split} \end{equation}

Putting all together we obtain:

\begin{equation} \begin{split} \frac{\partial \|X^T - S^TAX^T \| }{\partial A} & = 2SS^TAX^TX - 2SX^TX\\ & = 2S(S^TAX^T - X^T)X \\ \end{split} \end{equation}

MathLearner
  • 176
  • 4