Scaling/Normalization not need for tree based models

Question

I could not find a good answer/reference that can explain why rf/decision trees/gbm are not susceptible to the scale of values of numerical variables.

My sense is that since boosting methods penalize more if the error is large so they should certainly be susceptible to scale of the feature variables.

I have a dataset between 0-100 and some values an order of magnitude larger, in the range of 1000's. Should i scale them?

Based on your experience, does it help to scale features in tree based algos?

Yeah that's exactly my question. What makes them not susceptible to scale? Why don't they require scaling? — Aman, Feb 20 '17 at 03:14
http://stats.stackexchange.com/questions/72231/decision-trees-variable-feature-scaling-and-variable-feature-normalization — SmallChess, Feb 20 '17 at 03:15
http://stackoverflow.com/questions/29842647/feature-scaling-required-or-not — SmallChess, Feb 20 '17 at 03:16

score 9 · Accepted Answer · answered Feb 20 '17 at 06:56

If you are scaling the outcome variable, all you are doing is multiplying everything by a constant and/or adding a constant. So, any effect that it has is irrelevant (i.e., it does not change the relativities of anything).

In the case of the predictors, the scale of the predictor variables is not a determinant of the predictions in any way with a traditional tree-based model. For example, consider the following simple example with 4 observation, where y is the outcome and x is the predictor.

The optimal split for predicting y given x is somewhere between x being 5 and 6. Let's say 5.5

Now, if we scale x, by multipling it by 100, we change our optimal split to being, say, 550. But, our predictions (and thus our error) are completely unchanged.

Scaling/Normalization not need for tree based models

1 Answers1

Linked