I'm following a MOOC at the moment and they seem to suggest when scaling it's best to scale the training data first and then use the training set's parameters to scale the test set.
I see no reason for that, why not just scale the full dataset from the start, it will give us the most complete information on the shape of the data won't it ?
I suppose it doesn't make much of a difference as long as the sample size is not too small, but they use this strategy on a small dataset so I'm confused and would like to know if there is some consensus.