When standardizing data before training a neural network, say by subtracting the mean and then dividing by the standard deviation for each variable, there are several ways one could go about that and I am not clear which one is correct/the best:
- Get mean and sd of training, validation and test set separately and apply to the respective sets
- Get m and sd for combination of train and val set and apply it to both train and val set. Standardize test set separately with its mean and sd
- Get m and sd for combination of train and val set and apply it to all three sets
Clearly, one cannot get the mean and sd for the combination of all three sets because there should be no information leak from the test set into the training procedure. What is the right way to go about this and why?
Also, in a regression problem, should the targets be standardized too?