I have read in the literature and various other questions (as below) that normalising the inputs to have a mean of 0 and a standard deviation of 1 helps the gradient descent optimiser by making a steaper error surface.
However, do the inputs need to be normalised by taking into account the activation function in the network hidden layers? For example, when using sigmoid should the inputs be [0,1]? For tanh, should the inputs be [-1,1]? For relu, is it ok to have negative inputs?
Input Normalisation for ReLU neurons
Rescaling input features for neural networks regression
What characteristics should the input data have for a neural network?
Why normalise the standard deviation of neural network input?