I'm a bit new on this topic. Does Batch Normalization replace feature scaling?
As far as my understanding goes, the batch normalization uses an exponential moving average to estimate $\mu$ and $\sigma$ on the fly to normalize batches during the Neural Network training.
After the training ends,the estimated values of $\mu$ and $\sigma$ are used to scale the input test batches.
So, if we use Batch Normalization as an input layer to a Neural Network, do we still need to scale the inputs manually?