In initialising a deep network before training, what statistical property of gradients and of activations is desirable?
Asked
Active
Viewed 6 times
In initialising a deep network before training, what statistical property of gradients and of activations is desirable?