I am following deep learning.ai's videos on Coursera. I have a couple of questions about feature scaling using the formula:
$$ (x - \mu)/ \sigma $$
Edit: There are similar questions which deal with the same topic, but none of them answer these questions in particular. I have highlighted the question in bold to emphasise.
1.) What is the use of subtracting the mean? My understanding is that dividing by the SD scales the features and subtracting the mean centres the data around zero. But why is centring the data around zero useful?
2.) I understand that the values of mean and SD should be consistent across training and test sets. Are $ \mu $ and $ \sigma $ calculated on the entire dataset(train and test together)? Or are the values calculated on the train set and then applied to the test set?
Thanks in advance.