1

I was trying to find the derivative of logistic loss for all observations but I got stuck on the following step $$\frac{dZ}{d{{\theta }_{1}}}\frac{d\sigma }{dZ}\frac{dL}{d\sigma }$$ whete $Z$ is ${{\theta }^{T}}x$, ${\sigma }$ is the activation function and $L$ is the loss function. Now, $L$ is average of summation of all losses. $\frac{dL}{d\sigma }$ will reduce to average of some sum i.e. $$\frac{dZ}{d{{\theta }_{1}}}\frac{d\sigma }{dZ}\sum{something}$$ Now I am confused about how to proceed. I also saw this derivation which is directly derived w.r.t. ${\theta }$, but when I try to derive I get stuck in the step above.

Which aspect am I missing out?

abunickabhi
  • 160
  • 10

1 Answers1

1

Think simple first, take batch size (m) = 1. Write your loss function first, in terms of only the sigmoid function output, i.e. $o = \sigma(z)$, and take the derivative $\frac{dL}{do}$. You already have $\frac{do}{dZ} = o(1-o)$ and $\frac{dZ}{d\theta_1} = x_1$. Just substitute into the equation you first wrote down. And, compare with m = 1 case in the link you provided.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • For one obs. it absolutely makes sense. But when I do $m$ observations, my loss is avg. of the sum of log loss for obs. 1, obs. 2, obs. 3 .....obs. n and so on. So differentiating w.r.t. $\sigma$ means differentiating w.r.t $\sigma$ for obs 1, $\sigma$ for obs2...and $\sigma$ for obs n i.e. all partials. Now, I have something which is the avg. of the sum of certain partials. Now I have no clue how to do the derivative of $\sigma$ w.r.t $Z$ because it's not in summation form. – optimal substructure Sep 10 '18 at 14:37
  • You can write your loss as $L=\frac{1}{m}(L_1+...L_m)$, the above procedure finds the derivative of $L_i$, which is the loss calculated for a sample $x^{(i)}$. The final gradient will only be the average of these. – gunes Sep 10 '18 at 14:40