Is there a need to standardise training and test sets separately for binary classification problems?

Question

When setting up an ML framework for binary classification do we need to standardize our training and test sets separately?

This answer claims to standardize separately (although never states the target variable type) https://stackoverflow.com/questions/58939568/standardize-data-with-k-fold-cross-validation

If we are not standardizing the target variable, isn't it better to standardise our numeric descriptor variables with respect to the full dataset?

For example, say we are trying to predict if a transaction is fraudulent ($Y \in \{0,1\}$) based on Amount ($X_1$) and Customer Age ($X_2$) for a dataset of 1000 transactions.

If we randomly split 80/20 and then standardize, I would think that the test set's standardized values for both $X_1$ and $X_2$ are less accurate given n=200 (when we could use n=1000, resulting in having a mean closer to the variable's true mean).

If we had a real-world situation of wanting to test 1 new transaction we cannot standardize without the prior data.

Is this correct or am I missing something?

Thanks.

Is there a need to standardise training and test sets separately for binary classification problems?

0 Answers0