4

In related question (What algorithms need feature scaling, beside from SVM?) every answer stated that XGBoost doesn't require any standarization, but someone wrote in comment that:

+1. Just note that XGBoost actually implements a second algorithm too, based on linear boosting. Scaling will make a difference there.

I wonder what is that second algorithm and why it's sensitive to scaling? And, finally, should I standarize my data while using XGBoost?

Alexander Golys
  • 247
  • 2
  • 9

1 Answers1

5

The second algorithm referred is the linear booster. In that case, the base learner is an elastic net regression (i.e. a linear model with $L_1$ and $L_2$ regularisation). Regularised regression methods are sensitive to feature scaling. They need features to be on similar scale; otherwise, if the features are on different scales, we risk regularising a particular feature $x_1$ far more (or less) than another feature $x_2$ for the same regularisation pair values $(\lambda_1^*, \lambda_2^*)$.

Strictly speaking, tree-based methods do not require explicit data standardisation. XGBoost with a tree base learner would not therefore require this kind of preprocessing. That said, probably it will help numerically if the data themselves are not too large or too small values. In the end of the day, certain gradient calculations are done. Having values closer to unit scales instead of billions (or nanos) is more convenient; numerically the whole system is more stable.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • there is no non-linear trasformation? how is that many weak learners are aggregated to obtain a non-linear function? – carlo Sep 02 '20 at 12:49
  • 1
    Sorry, carlo, I do not refer to non-linear transformations in this answer. Maybe you want to post a separate question? – usεr11852 Sep 02 '20 at 13:01
  • I did it here: https://stats.stackexchange.com/questions/485717/linear-weak-learners-for-xgboost – carlo Sep 02 '20 at 17:58