In XGboost are weights estimated for each sample and then averaged

Question

The weights in XGBoost are determined by gradient boosting. So, each sample gets a weight and as each leaf has multiple samples, initially each leaf has multiple weights. But, as a single weight is needed for each leaf (based on the below thread, please correct me if my understanding is wrong), now are the multiple sample weights in a leaf averaged to get a single weight?

How does gradient boosting calculate probability estimates?

score 4 · Accepted Answer · answered Oct 16 '20 at 21:01

4

Nearly.

Vanilla GBMs work pretty much like this. Each tree is built to approximate the gradient of the loss function, but then the tree construction is just like any ordinary regression tree: split using some impurity criterion, and assign the average value at the leaves.

One of XGBoost's additions to the algorithm is the second derivative. The exact answer to your question is equation 5 of the paper: https://arxiv.org/pdf/1603.02754.pdf $$ w^*_j = − \frac{ \sum_{i\in I_j} g_i }{ \sum_{i\in I_j} h_i + \lambda } $$ To compare to the above, think about the case of regression with MSE loss, where $h_i$ is constant, and without regularization, so $\lambda=0$.

answered Oct 16 '20 at 21:01

Ben Reiniger

2,521
1
8
15

1

Thank you. So based on the equation, (in case of XGBoost) the weight of the leaf is computed from the g,h of all the samples and therefore not an average of weights of each sample. Is my understanding correct ? – tjt Oct 16 '20 at 21:10
1

Yes, though I guess it depends on what you mean by "weight" of a sample, and how flexible you are in defining "average". In the special case of MSE w/o regularization, the denominator is constant, and so you can interpret the entire thing as an average of g's. And more generally, if you treat the "weights" to be g/h, then this definition is sort of a funny average of those. (And note that $I_j$ is the set of rows in the leaf node.) – Ben Reiniger Oct 16 '20 at 21:14
I am referring to the score 'w' in above eq 5 as weight of the leaf (please correct me if wrong). By average I mean a simple average, like that in case of plain GBM, but I understand from the comment that in a sense this can termed as average as well. Very clear. thank you. Ij is basically the samples in the leaf, right? – tjt Oct 16 '20 at 21:20
@BenReiniger beat me to it! +1 – Sycorax Oct 16 '20 at 21:29

In XGboost are weights estimated for each sample and then averaged

1 Answers1

Linked