3

User Antoni Parellada had a long derivation here on logistic loss gradient in scalar form. Using the matrix notation, the derivation will be much concise. Can I have a matrix form derivation on logistic loss? Where how to show the gradient of the logistic loss is

$$ A^\top\left( \text{sigmoid}~(Ax)-b\right) $$


  1. For comparison, for linear regression $\text{minimize}~\|Ax-b\|^2$, the gradient is $2A^\top\left(Ax-b\right)$, I have a derivation here.

  2. Related question: Matrix notation for logistic regression

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • As credited in the linked answer, the credit for the derivation goes to the students' cumulative notes within the [excellent course by Andrew Ng on ML](https://www.coursera.org/learn/machine-learning?utm_source=gg&utm_medium=sem&campaignid=693373197&adgroupid=36745103475&device=c&keyword=%2Bandrew%20%2Bng%20%2Bcourse&matchtype=b&network=g&devicemodel=&adpostion=1t1&creativeid=156061453576&hide_mobile_promo&gclid=CjwKEAjw9MrIBRCr2LPek5-h8U0SJAD3jfhtf52WjvuA2lReqWEeGDn6gjkItE1opdUe-hFCvDiZ0BoChXzw_wcB). – Antoni Parellada May 11 '17 at 02:24
  • 1
    Why not try translating the scalar-form derivation as a [tag:self-study]? (BTW you should edit the question to explicitly state the loss itself, in your preferred notation, to make it self contained) – GeoMatt22 May 11 '17 at 02:36

1 Answers1

8

Here is my try

$$J(x) = -\frac{1}{m}\sum_{i = 1}^{m} b_iln(h_i) + (1 - b_i)ln(1 - h_i)$$

where $h_i = \sigma(x^Ta_i)$. Let $A = [a_1^T, \dots, a_m^T]^T$. Assuming $ln, \sigma, \frac{1}{\cdot}$ work element-wise on vectors, $\odot$ is element-wise multiplication and $\mathbb{1}$ is a vector of $1$s we have

$$J(x) = -\frac{1}{m}\big[b^Tln(\sigma(Ax)) + (\mathbb{1} - b)^Tln(\mathbb{1} - \sigma(Ax))\big]$$

Now

$$\frac{\partial J(x)}{\partial x} = -\frac{1}{m}\Big[\frac{\partial}{\partial x}b^Tln(\sigma(Ax)) + \frac{\partial}{\partial x}(\mathbb{1} - b)^Tln(\mathbb{1} - \sigma(Ax))\Big] \\ = -\frac{1}{m}\Big[\frac{\partial ln(\sigma(Ax))}{\partial x}b + \frac{\partial ln(\mathbb{1} - \sigma(Ax))}{\partial x}(\mathbb{1} - b)\Big] \\ = -\frac{1}{m}\Big[\frac{\partial \sigma(Ax)}{\partial x} \big(\frac{1}{\sigma(Ax)}\odot b\big) + \frac{\partial \mathbb{1} - \sigma(Ax)}{\partial x}\big(\frac{1}{\mathbb{1} - \sigma(Ax)}\odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[\frac{\partial Ax}{\partial x} \big(\sigma(Ax) \odot (\mathbb{1} - \sigma(Ax)) \odot \frac{1}{\sigma(Ax)}\odot b\big) - \frac{\partial Ax}{\partial x}\big(\sigma(Ax) \odot (\mathbb{1} - \sigma(Ax)) \odot \frac{1}{\mathbb{1} - \sigma(Ax)}\odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big((\mathbb{1} - \sigma(Ax)) \odot b\big) - A^T\big(\sigma(Ax)) \odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big(b - \sigma(Ax) \odot b - \sigma(Ax) + \sigma(Ax) \odot b\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big(b - \sigma(Ax)\big)\Big] \\ = \frac{1}{m}\Big[A^T \big(\sigma(Ax) - b\big)\Big] $$

Łukasz Grad
  • 2,118
  • 1
  • 7
  • 10