2

In Neural Nets for the regression problem, we rescale the continuous labels consistently with the output activation function, i.e. normalize them if the logistic sigmoid is used, or adjusted normalize them if tanh is used. At the end we can restore original range but renormalizing the output neurons back.

Should we also normalize input features? And how? For example, if hidden activation differs from the output activation? E.g. if hidden activation is TANH and output activation is LOGISTIC, should the input features be normalized to lie in [0,1] or [-1,1] interval?

Oleg Shirokikh
  • 795
  • 1
  • 8
  • 17
  • I suggest you to read Lecun, Efficient Backprop, 1986 (http://scholar.google.it/scholar?cluster=15983004533596008350&hl=en&as_sdt=0,5) where the author proposes and discuss some tricks and tuning about NNs. I always normalize input features between -1 and 1. – Matteo De Felice Oct 13 '13 at 08:25

2 Answers2

1

The output of TANH is already between -1 and 1. So, if you normalise the inpu, be sure to normalise for the hidden activation functions. In theory it is not required to normalise, because tanh(1000) is mathematically different from tanh(10000). But in practice these are the same, so you should indeed normalise the input in most applications.

1

For regression tasks you should be using a linear neuron in the output. Your logit output likely looks sigmoidal when plotted against the response variable. Plus the loss function you're using, probably makes little sense in this context.

Input features should always be de-meaned and divided by standard deviation. That has nothing to do with the unit types, and everything to do with training by backprop. Gradient descent will be cradling around a minima if you don't normalize properly because the error surface will be a thin ellipsoid.

Finally, consider rectified linear units in the hidden because they train much faster than logit or tanh.

Jessica Collins
  • 3,861
  • 17
  • 20