As far as I understand, one of the main claimed problems with initializing e.g. a feed-forward neural network (with several $\text{tanh}$ or $\text{ReLU}$ layers) with $W=0$ is that it doesn't break "network symmetry", meaning, backpropagation would propagate the same error through all such units (i.e. "nudging all weights in the same direction"). This is, I presume, undesirable because we would not be learning "different" calculations through different paths of the network.
However, I'm confused why that even matters in this case given the fact that if $W$ ever drops to 0, we will effectively be propagating no gradients at all through the network, since W=0
would multiply all errors from the output and prevent any learning.
Put another way, even if $W=0$ does not break network symmetry (wasting calculations and paths in the network) is it correct to say that if $W=0$ (e.g. by initialization) we are effectively killing gradients in a neural network, and thus no learning can take place?