I am learning XGBoost
from documentation, but there are a few questions in the derivation of it.
In the part of Additive Training
of Tree Boosting
, they say we take the Taylor expansion of the loss function up to the second order in general case, but I get some questions in derivation from:
$\text{obj}^{(t)} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) + \mathrm{constant}$
to:
$\text{obj}^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + \mathrm{constant}$
where the $g_i$ and $h_i$ are defined as
$\begin{split}g_i &= \partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\\ h_i &= \partial_{\hat{y}_i^{(t-1)}}^2 l(y_i, \hat{y}_i^{(t-1)})\end{split}$
I mean I know how to make the Taylor series expanded to second order:
$f(x) = f(x_k) + (x - x_k)f^{'}(x^{k}) + \frac{1}{2!}(x-x_k)^2f^{''}(x_k) + o^n$
And I assume $f(x) = l(y_i, x)$, $x = \hat{y}^{(t-1)} + f_t(x_i)$ and $x_k = \hat{y}^{(t-1)}$, then use $\partial\hat{y}^{(t-1)}$ in the Talyor series, so we get the right result as mentioned above.
But I don't know that is a right derivation or not and even if it is right, I still feel it is hard to understand why they choose to expand it in this way.
I would appreciate it if anyone could help me.