1

I am trying, without much success so far, to derive the gradient of the following cost function in order to fit a logistic curve to some data:

$J(a, k, b, m) = \sum_i^n(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

Most resources I find on fitting logistic functions are dedicated to classification problems, which doesn't fit my particular issue. I would like to use gradient descent to fit my function.

Here is my current situation and what I would like someone else to check (using $\frac{\partial J}{\partial a}$ as an example).

$\frac{\partial J}{\partial a} = \frac{\partial}{\partial a} \sum_i^n(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

$= \sum_i^n \frac{\partial}{\partial a}(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

$= \sum_i^n \frac{\partial}{\partial a}(y_i^2 - 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) + (a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 )$

$= \sum_i^n (\frac{\partial}{\partial a}y_i^2 - \frac{\partial}{\partial a} 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) + \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 )$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 - \frac{\partial}{\partial a} 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 - 2*y_i* \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*2 - 2*y_i* \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

(based on this preexisting derivation of the generalized logistic function)

$= \sum_i^n ((1 -(1 + e^{-b*(x_i - m)})^{-1})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*2 - 2*y_i*(1 -(1 + e^{-b*(x_i - m)})^{-1}))$

$= 2* \sum_i^n ((1 -(1 + e^{-b*(x_i - m)})^{-1})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) - y_i*(1 -(1 + e^{-b*(x_i - m)})^{-1}))$

Does it look correct and if not where did I go wrong?

jjanssen
  • 13
  • 5
  • I would like to suggest your notation is over-complicating a simple calculation. Since you want to compute the derivative with respect to $a$, you may write the function $J(a)$ as the sum of squared linear expressions in $a$, as in $$\sum_i\left(\beta_i a + \gamma_i\right)^2.$$ The values of $\beta_i$ and $\gamma_i$ can readily be identified by comparison to your original expression. The derivative of that kind of expression is given in a huge number of threads here, because it is simply a sum of squares. – whuber Jan 19 '16 at 14:05
  • Thanks for the answer. As you can probably guess I don't really know what I am doing, as a software developer by trade my calculus is a bit rusty. I am not sure I understand what you mean by rewriting $J(a)$ as the sum of squared linear expressions in a. Could you elaborate or point me towards an external resource that goes in more detail? – jjanssen Jan 19 '16 at 15:30
  • Set $$\beta_i = -1 - \frac{1}{1 + \exp(-b(x_i-m))}$$ and $$\gamma_i=y_i + \frac{k}{1 + \exp(-b(x_i-m))}.$$ – whuber Jan 19 '16 at 16:22
  • I think the simplicity your refer to eludes me. Would you happen to have a link towards a more exhaustive explanation? – jjanssen Jan 20 '16 at 18:20
  • https://en.wikipedia.org/wiki/Least_squares#Solving_the_least_squares_problem – whuber Jan 21 '16 at 00:14
  • Oh. Does that mean that the partial derivative for any parameter $\beta$ is simply $\frac{\partial}{\partial \beta}J(\theta) = -2 * \sum_{i = 0}^{n}{ \left[(y_i - f(\theta, x_i)) * (\frac{\partial}{\partial \beta}f(\theta, x_i)) \right]}$ ? If so I can see how I was overcomplicating everything. – jjanssen Jan 21 '16 at 12:48
  • I have had a hard time finding worked examples on this site--although I have seen many--but I just came across one at http://stats.stackexchange.com/questions/69205. – whuber Jan 21 '16 at 14:07
  • I hope I'm not adding more confusion, but the OP was looking for a derivation of the logistic loss function, which, in my (possibly wrong) understanding, is different from the least squares one – Ciprian Tomoiagă Oct 29 '16 at 18:07

0 Answers0