Gradient descent applying chain rule in state space setup

Question

Trying to perform system identification in the following state-space model $$ \begin{bmatrix} x_{1}(n)\\ x_{2}(n) \\ x_{3}(n)\end{bmatrix}=\begin{bmatrix} a_{11} && a_{12} && a_{13} \\ a_{21} && a_{22} && a_{23} \\ a_{31} && a_{32} && a_{33} \end{bmatrix} \begin{bmatrix} x_{1}(n-1)\\ x_{2}(n-1) \\ x_{3}(n-1)\end{bmatrix} +\begin{bmatrix} b_{11} && b_{12} \\ b_{21} && b_{22} \\ b_{32} && b_{23}\end{bmatrix} \begin{bmatrix} u(n) \\ u(n-1) \end{bmatrix} $$

$$ y(n) = \begin{bmatrix} c_{1} && c_{2} && c_{3}\end{bmatrix}\begin{bmatrix} x_{1}(n)\\ x_{2}(n) \\ x_{3}(n)\end{bmatrix} $$

$u$ in the input sequence and $y$ is the output sequence. If $p$ is any parameter in the model the following 'LMS rule' is used for estimating the parameter.

It is seen that the gradient of the output with respect to the parameter $p$ is needed. It is possible to compute these gradients with the expressions found in e.g. enter link description here. Experimentally I have verified that it works. So far so good.

All the parameters in the model I'm considering are computed from 6 underlying parameters $\theta_1, \theta_2, \theta_3, \theta_4, \theta_5, \theta_6$ which are the parameters that really need to be estimated/identified.

In a synthetic setup I'm keeping $\theta_3, \theta_4, \theta_5, \theta_6$ constant (pretending they are known) and only trying to estimate $\theta_1, \theta_2$. Both $a_{11}$ and $a_{12}$ depend on $\theta_1, \theta_2$. So I can write $a_{11}(\theta_1, \theta_2)$ and $a_{12}(\theta_1, \theta_2)$. Other parameters in the model also depend on $\theta_1, \theta_2$. My question is why I can't compute the gradients $\frac{\partial y(n)}{\partial \theta_1}$ and $\frac{\partial y(n)}{\partial \theta_2}$ by

$$ \begin{bmatrix} \frac{\partial y(n)}{\partial \theta_1}\\ \frac{\partial y(n)}{\partial \theta_2} \end{bmatrix} = \begin{bmatrix} \frac{\partial a_{11}(\theta_1, \theta_2)}{\partial \theta_1} && \frac{\partial a_{12}(\theta_1, \theta_2)}{\partial \theta_1} \\ \frac{\partial a_{11}(\theta_1, \theta_2)}{\partial \theta_2} && \frac{\partial a_{12}(\theta_1, \theta_2)}{\partial \theta_2} \end{bmatrix} \begin{bmatrix} \frac{\partial y(n)}{\partial a_{11}}\\ \frac{\partial y(n)}{\partial a_{12}} \end{bmatrix} $$

It may be a stupid bug but after serious debugging I'm starting to wonder if the above approach is fundamentally flawed, I just don't quite understand why it shouldn't work.

I think this would be appropriate for [SCICOMP.SE](http://scicomp.stackexchange.com). — Gilles, Jan 03 '17 at 16:54
I am also thinking, why that is not working?. We dont know the relation on the $\theta_k$ and the $a_{ij}$ parameters, so, we are even more blinded for answering this..... By the way, do you know there are standard algorithms for estimating state space models? — Brethlosze, Jan 16 '17 at 05:29
Thanks for chiming in hypfco. I came to the conclusion that the computed gradients were wrong because I need to take into account contributions to the gradients from the other parameters. What I mean by this is that in the multidimensional chain rule their is a Jacobian matrix mapping the gradients from one domain to another domain. I'm still working on this problem and haven't found out how to do it yet. In my problem the Jacobian matrix is quite tedious to derive and there still are some things that I'm unsure of. — niaren, Jan 17 '17 at 09:16
I'm not sure what you mean by standard algorithms. I have implemented a Bayesian approach as well (particle filtering). It also has some problems with convergence which I have to look into when I'm done with the gradient approach. I'm not sure 'standard algorithms' works for my particular problem because it is not the state-space coefficients that are my target. It is a lower dimensional set of parameters that are mapped into a state-space formulation that is my target. — niaren, Jan 17 '17 at 09:21

Gradient descent applying chain rule in state space setup

0 Answers0