I read the answer in Why are non zero-centered activation functions a problem in backpropagation? I can understand that for a positive activation function, gradient of each dimension is of the same sign. but I still have a question:
for a zero centered data, we cannot have a gradient whose every dimension's component is of the same sign, so we still have some direction where we cannot go. Is it right?
For instance, the answer says
Say there are two parameters $w1$ and $w2$, if the gradients of two dimensions are always of the same sign, it means we can only move roughly in the direction of northeast or southwest in the parameter space. This may leads to a zig-zag path if the optimal direction is northwest or southeast.
But if the data is zero-centered, the gradients of two dimensions are always of the different sign. Then we can only move in the direction of northwest or southeast, still leading to a zig-zag path if the optimal direction is northeast or southwest.
So in both cases we have some directions where we cannot go, leading to a zig-zag path.
What's wrong with my understanding?