3

I've seen several ways to normalize a data (features or even images) before use as input in a NN or CNN.

The most common I saw are:

  • [0, 1]: (data - min(data)) / (max(data) - min(data))
  • z-score: (data - mean(data)) / std.dev(data)

What would be the best/recommend? Are the way chosen really affect the training of the model?

Please, I'm really lost with so much opinions on this topic, would be good you could provide a reference as paper or book.

Helder
  • 139
  • 1
  • 5

2 Answers2

3

There is no best way. If your data was uniformly distributed, you'd probably be better off with scaling by range; for bell shaped distribution the standard deviation based normalization may work better. In the end it rarely matters.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
0

Deep Learning with Python by Francois Chollet (creator of Keras) says to use z-score normalization.

nababs
  • 213
  • 1
  • 7
  • I thought I'd add this link here to supplement the answer because I didn't know this:https://stats.stackexchange.com/questions/318170/min-max-scaling-on-z-score-standardizd-data doing min-max scaling is the equivalent of doing z-score norm and then min-max! With all things deep learning, it is helpful to have your data in the same range so both methods don't make a difference but I stick to min-max now because training sometimes converges faster – nababs Aug 30 '18 at 18:10