In autoencoders when using tied weights, it was mentioned the gradient with respect to w is the sum of two weights. I didn't understand this, can someone elucidate this.
it is mentioned here on slide 4 in the end.
In autoencoders when using tied weights, it was mentioned the gradient with respect to w is the sum of two weights. I didn't understand this, can someone elucidate this.
it is mentioned here on slide 4 in the end.
Following the notation in the slides, a one layer autoencoder with tied weights is given by $$o(\hat{a}(x))=o(c+W^Th(x))=o(c+W^T\sigma(b+Wx))$$
The gradient wrt $W$ according to the product rule $$\frac{\partial l}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}\frac{\partial \hat{a}_j}{\partial W_{ij}}=\frac{\partial l}{\partial \hat{a}_j}(h_i+W_{ij}\frac{\partial h_i}{\partial W_{ij}})=\frac{\partial l}{\partial \hat{a}_j}h_i+\frac{\partial l}{\partial a_i}x_j$$ which is equal to adding up the backpropagation gradients of each layer.