Autoencoders' gradient when using tied weights

Question

In autoencoders when using tied weights, it was mentioned the gradient with respect to w is the sum of two weights. I didn't understand this, can someone elucidate this.

it is mentioned here on slide 4 in the end.

score 4 · Answer 1 · answered Oct 15 '16 at 08:40

Following the notation in the slides, a one layer autoencoder with tied weights is given by $$o(\hat{a}(x))=o(c+W^Th(x))=o(c+W^T\sigma(b+Wx))$$

The gradient wrt $W$ according to the product rule $$\frac{\partial l}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}\frac{\partial \hat{a}_j}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}(h_i+W_{ij}\frac{\partial h_i}{\partial W_{ij}})=\frac{\partial l}{\partial \hat{a}_j}h_i+\frac{\partial l}{\partial a_i}x_j$$ which is equal to adding up the backpropagation gradients of each layer.

Autoencoders' gradient when using tied weights

1 Answers1