6

In autoencoders when using tied weights, it was mentioned the gradient with respect to w is the sum of two weights. I didn't understand this, can someone elucidate this.

it is mentioned here on slide 4 in the end.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Abhishek Bhatia
  • 461
  • 4
  • 13

1 Answers1

4

Following the notation in the slides, a one layer autoencoder with tied weights is given by $$o(\hat{a}(x))=o(c+W^Th(x))=o(c+W^T\sigma(b+Wx))$$

The gradient wrt $W$ according to the product rule $$\frac{\partial l}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}\frac{\partial \hat{a}_j}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}(h_i+W_{ij}\frac{\partial h_i}{\partial W_{ij}})=\frac{\partial l}{\partial \hat{a}_j}h_i+\frac{\partial l}{\partial a_i}x_j$$ which is equal to adding up the backpropagation gradients of each layer.

dontloo
  • 13,692
  • 7
  • 51
  • 80