10

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.

I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script.

Here is the simplified example:

import numpy as np


def loglikelihoodloss(y_hat, y_true):
    prob = 1.0 / (1.0 + np.exp(-y_hat))
    grad = prob - y_true
    hess = prob * (1.0 - prob)
    return grad, hess

y_hat = np.array([1.80087972, -1.82414818, -1.82414818,  1.80087972, -2.08465433,
                  -1.82414818, -1.82414818,  1.80087972, -1.82414818, -1.82414818])
y_true = np.array([1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.])

loglikelihoodloss(y_hat, y_true)

The log loss function is the sum of $y\ln\left(p\right)+\left(1-y\right)\ln\left(1-p\right)$ where $p = \dfrac{1}{(1 + e^{-x})}$.

The gradient (with respect to p) is then $\dfrac{p-y}{\left(p-1\right)p}$ however in the code its $p -y$.

Likewise the second derivative (with respect to p) is $\dfrac{\left(y-p\right)p+y\left(p-1\right)}{\left(p-1\right)^2p^2}$ however in the code it is $p(1-p)$.

How are the equations equal?

Greg
  • 335
  • 1
  • 4
  • 9

1 Answers1

16

The derivatives are with respect to $x$ (or y_hat in the code) instead of $p$.

As you've already derived (Edit: as Simon.H mentioned, since the actual loss should be the negative log likelihood, so I've changed the sign of your result) $$\frac{\partial f}{\partial p}=\frac{p-y}{\left(1-p\right)p},$$ and the derivative of sigmoid is $$\frac{\partial p}{\partial x}=p(1-p),$$ so $$\frac{\partial f}{\partial x}=\frac{\partial f}{\partial p}\frac{\partial p}{\partial x}=p-y,$$ and the second order derivative $$\frac{\partial^2 f}{\partial x^2}=\frac{\partial}{\partial x}(\frac{\partial f}{\partial x})=\frac{\partial}{\partial x}(p-y)=\frac{\partial p}{\partial x}=p(1-p).$$

dontloo
  • 13,692
  • 7
  • 51
  • 80
  • 1
    Using your logic, I get y-p instead of p-y for the first derivative. The reason is, because the denominator of the your first formula is p(1-p) instead of p(p-1). That makes a difference. It seems like you silently took the negative of the first derivative. – shb Feb 21 '19 at 21:31
  • @Simon.H hi thanks for pointing that out, it was a mistake. I've updated the answer, the sign of $\partial f/\partial p$ should be changed :) – dontloo Feb 22 '19 at 05:28