I'm going through a solution of the bike sharing demand problem and one moment about scaling data is unclear to me. Concretely, why do we fit scaler only on our training data instead of the whole dataset?
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(train_data, train_labels)
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)
I think it has something to do with the fact that we have a time feature. Due to that feature we divide our dataset into training set and test set so that examples in the training set happened earlier that examples in the test set. I thought that due to the same reason we scale data differently as well, but I don't have a good intuition about the matter.
So, why do we do it this way?