3

I am following the paper on xgboost and got stuck at equation 6 ($3^{rd}$ page), where the authors say that for a given tree, "we can compute the optimal weight $w_j^*$ of leaf $j$ by" \begin{split}w_j^\ast = -\frac{G_j}{H_j+\lambda}\end{split} Here, $G_j$ and $H_j$ are the first and second derivatives of a loss function $l$ of the previous ($t-1$) prediction for the set of examples ($I_j$) ending in the given leaf $j$: \begin{split}G_i &= \sum_{i \in I_j}\partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\\ h_i &= \sum_{i \in I_j}\partial_{\hat{y}_i^{(t-1)}}^2 l(y_i, \hat{y}_i^{(t-1)}) \end{split}

My question really is, what makes the weight defined in such a way optimal? Does it follow in some obvious way (since it is not explained in the paper) from the general definition of the loss function? \begin{split} \mathcal{L}^{(t)} = \sum_{j=1}^T [G_jw_j + \frac{1}{2}(H_j + \lambda)w_j^2] + \gamma T \end{split} Particularly, I am struggling with the $H_j$ in the denominator. The general gradient boosting algorithm is supposed to fit new functions to the gradient of the loss function, but the second derivative confuses me).

SheepPerplexed
  • 371
  • 1
  • 8

1 Answers1

2

Ok, this is embarrassing. The expression comes from minimizing the loss function analytically by taking the derivative of the relevant part and setting it to zero: \begin{split} (G_jw_j + \frac{1}{2}(H_j + \lambda)w_j^2)' &= 0 \\ G_j + (H_j + \lambda)w_j &= 0 \\ w_j &= -\frac{G_j}{H_j + \lambda} \end{split}

I still don't have good intuition for the denominator (it would be appreciated) but at least now I can see where it came from.

SheepPerplexed
  • 371
  • 1
  • 8
  • As Matthew Drury commented, the second derivative is similar to [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method) which is an application of the [Taylor expansion](https://en.wikipedia.org/wiki/Taylor%27s_theorem) of the function to optimize. While not the same question as yours, you might like [this question](https://stats.stackexchange.com/questions/202858) for more details. – Winks May 03 '17 at 09:28