There are already some good answers on when feature scaling is desired and when to center vs. scale. However, they don't explain which scaling method to use in which situation. For context, assume that you have some numerical data that you've already decided needs to be scaled, and that both model predictive power and interpretability are important. Some reasons for scaling could be:
- Preparing the data to create interaction variables
- Ensuring that regularization affects all variables equally
- Speeding up gradient descent
- Making the coefficients easier to interpret
Jeff Hale has a sensible-looking blog post on different scaling methods. Here are his conclusions:
Do his conclusions make sense, and if not, what's the correct methodology for choosing a scaling method? Ideally there'd be some scientific evidence as well.
I'll also add Sebastian Raschka's recommendation to use standardization for PCA, because "we are interested in the components that maximize the variance".