Does Batch Normalized network still need scaled inputs?

Question

I'm a bit new on this topic. Does Batch Normalization replace feature scaling?

As far as my understanding goes, the batch normalization uses an exponential moving average to estimate $\mu$ and $\sigma$ on the fly to normalize batches during the Neural Network training.

After the training ends,the estimated values of $\mu$ and $\sigma$ are used to scale the input test batches.

So, if we use Batch Normalization as an input layer to a Neural Network, do we still need to scale the inputs manually?

gunes · Accepted Answer · 2020-09-04T07:50:44.393

Typically, batch normalization is for intermediate layers, but feature scaling/standardisation is for the first layer. You could calculate $\mu,\sigma$ for each intermediate layer input using the whole data and use them to normalise your batch, but since the weights of the network change at each iteration, this would be extremely costly. That is why batch normalisation tries to estimate it from the batch at hand (and using some historical information sometimes, e.g. exp-avg). But, if that calculation was free, we would use the $\mu,\sigma$ for the overall training set. This is the situation if you put BN layer to the beginning. It works actually, but why would you estimate something that is already known (after preprocessing step).

Does Batch Normalized network still need scaled inputs?

1 Answers1