I was able to use the following equation to find derivatives of matrix function:
$f(X+h) = f(X) + Ah + o(|h|) \quad \cdots (1)$
where $h$ is small displacement and $A$ is the jacobian matrix. I found couple more equations that find derivatives of matrix function:
$D_Yf(X) = \lim_{t->0} \frac{f(X+tY) - f(X)}{t} \quad \cdots (2)$
$D_Yf(X) = \lim_{t->0} \frac{f(X+tY) - f(X)}{t} = tr(Y^TU) \quad \cdots (3)$
It seems like the first equation and second equation are identical with (2) being more precise in terms of the definition of a derivative( Equation (2) also shows why $|h|^2$ is ignored). Furthermore, both (1) and (2) apply to functions $\mathbb{R}^n$->$\mathbb{R}^m$. Equation (3) seems to be special case of (2). Equation (3) was used like the following:
$f(X) = tr(AX)$ $D_Yf(X) = \lim_{t->0} \frac{f(X+tY)-f(X)}{t} \\ = \lim_{t->0} \frac{tr(A(X+tY) - tr(AX)}{t} \\ = \lim_{t->0} \frac{tr(AX+AtY] - tr(AX)}{t} \\ = \lim_{t->0} \frac{tr(tAY)}{t} \\ = \lim_{t->0} tr(AY)\\ = tr(AY) \\ = tr([AY]^T) \\ = tr(Y^TA^T)$
$U=A^T$, therefore $D_Yf(X) = A^T$.
Couple questions regarding those three equations:
Is $h$ in eq. (1) same as $tY$ in equation (2)?(something seems a bit missing in equation(1) to me...)
I don't quite understand what $tr(Y^TU)$ meansin equation (3). $Y$ seems to be a directional matrix and $U$ seems to be the jacobian but what exactly does that expression mean? And how does formatting into $tr(Y^TU)$ form give us the derivative?
EDIT: they(where i found equation (3)) used column vector for the gradient. (http://www.tc.umn.edu/~nydic001/docs/unpubs/Schonemann_Trace_Derivatives_Presentation.pdf)