The question of Taylor expansion of loss function in XGBoost

Question

I am learning XGBoost from documentation, but there are a few questions in the derivation of it.

In the part of Additive Training of Tree Boosting, they say we take the Taylor expansion of the loss function up to the second order in general case, but I get some questions in derivation from:

$\text{obj}^{(t)} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) + \mathrm{constant}$

to:

$\text{obj}^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + \mathrm{constant}$

where the $g_i$ and $h_i$ are defined as

$\begin{split}g_i &= \partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\\ h_i &= \partial_{\hat{y}_i^{(t-1)}}^2 l(y_i, \hat{y}_i^{(t-1)})\end{split}$

I mean I know how to make the Taylor series expanded to second order:

$f(x) = f(x_k) + (x - x_k)f^{'}(x^{k}) + \frac{1}{2!}(x-x_k)^2f^{''}(x_k) + o^n$

And I assume $f(x) = l(y_i, x)$, $x = \hat{y}^{(t-1)} + f_t(x_i)$ and $x_k = \hat{y}^{(t-1)}$, then use $\partial\hat{y}^{(t-1)}$ in the Talyor series, so we get the right result as mentioned above.

But I don't know that is a right derivation or not and even if it is right, I still feel it is hard to understand why they choose to expand it in this way.

I would appreciate it if anyone could help me.

Are you familiar with how to write the Taylor expansion of a function? What have you written? Where are you stuck? — Sycorax, Mar 18 '19 at 03:58
Suppose I gave you the choice of minimizing an arbitrary convex function $K(x)$ or a convex quadratic function. Which do you think would be easier? Why? — Sycorax, Mar 18 '19 at 13:04

The question of Taylor expansion of loss function in XGBoost

0 Answers0