Should one scale the full dataset or the training set directly

Asked Oct 30 '17 at 00:06

Active Oct 30 '17 at 00:06

Viewed 37 times

I'm following a MOOC at the moment and they seem to suggest when scaling it's best to scale the training data first and then use the training set's parameters to scale the test set.

I see no reason for that, why not just scale the full dataset from the start, it will give us the most complete information on the shape of the data won't it ?

I suppose it doesn't make much of a difference as long as the sample size is not too small, but they use this strategy on a small dataset so I'm confused and would like to know if there is some consensus.

asked Oct 30 '17 at 00:06

Moody_Mudskipper

duplicate indeed, thank you, I have questions about the chosen answer though, that I asked in the comments there, so if you guys want to visit the link above and enlighten me I'll be grateful! – Moody_Mudskipper Oct 30 '17 at 08:47
If you have a specific question it might be better to ask a new one with a link back to that question for information.. – mdewey Oct 30 '17 at 09:15
Thanks but i'm afraid it will be marked as duplicate again :). My question really is a duplicate, it's just that the present answer doesn't totally satisfy me. – Moody_Mudskipper Oct 30 '17 at 09:34

Should one scale the full dataset or the training set directly

0 Answers0