1

Assume $p$ is the keep probability for drop out, for the forward-propagation, we do the scaling for the inputs as $A_r = A/p$. In the backpropagation, as many other people said (dropout: forward prop VS back prop in machine learning Neural Network), we should obtain the gradient as $dA = dA_r/p$. However, in my view, it should be $dA = dA_r * p$. Is it correct?

Harry
  • 205
  • 1
  • 8

0 Answers0