In Neural Nets for the regression problem, we rescale the continuous labels consistently with the output activation function, i.e. normalize them if the logistic sigmoid is used, or adjusted normalize them if tanh is used. At the end we can restore original range but renormalizing the output neurons back.
Should we also normalize input features? And how? For example, if hidden activation differs from the output activation? E.g. if hidden activation is TANH and output activation is LOGISTIC, should the input features be normalized to lie in [0,1] or [-1,1] interval?