Deriving the gradient of a loss function for generalized logistic regression

Question

I am trying, without much success so far, to derive the gradient of the following cost function in order to fit a logistic curve to some data:

$J(a, k, b, m) = \sum_i^n(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

Most resources I find on fitting logistic functions are dedicated to classification problems, which doesn't fit my particular issue. I would like to use gradient descent to fit my function.

Here is my current situation and what I would like someone else to check (using $\frac{\partial J}{\partial a}$ as an example).

$\frac{\partial J}{\partial a} = \frac{\partial}{\partial a} \sum_i^n(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

$= \sum_i^n \frac{\partial}{\partial a}(y_i - a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2$

$= \sum_i^n \frac{\partial}{\partial a}(y_i^2 - 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) + (a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 )$

$= \sum_i^n (\frac{\partial}{\partial a}y_i^2 - \frac{\partial}{\partial a} 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) + \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 )$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 - \frac{\partial}{\partial a} 2*y_i*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})^2 - 2*y_i* \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

$= \sum_i^n (\frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*2 - 2*y_i* \frac{\partial}{\partial a}(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}))$

(based on this preexisting derivation of the generalized logistic function)

$= \sum_i^n ((1 -(1 + e^{-b*(x_i - m)})^{-1})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})})*2 - 2*y_i*(1 -(1 + e^{-b*(x_i - m)})^{-1}))$

$= 2* \sum_i^n ((1 -(1 + e^{-b*(x_i - m)})^{-1})*(a + \frac{k - a}{(1 + e^{-b*(x_i - m)})}) - y_i*(1 -(1 + e^{-b*(x_i - m)})^{-1}))$

Does it look correct and if not where did I go wrong?

I would like to suggest your notation is over-complicating a simple calculation. Since you want to compute the derivative with respect to $a$, you may write the function $J(a)$ as the sum of squared linear expressions in $a$, as in $$\sum_i\left(\beta_i a + \gamma_i\right)^2.$$ The values of $\beta_i$ and $\gamma_i$ can readily be identified by comparison to your original expression. The derivative of that kind of expression is given in a huge number of threads here, because it is simply a sum of squares. — whuber, Jan 19 '16 at 14:05
Thanks for the answer. As you can probably guess I don't really know what I am doing, as a software developer by trade my calculus is a bit rusty. I am not sure I understand what you mean by rewriting $J(a)$ as the sum of squared linear expressions in a. Could you elaborate or point me towards an external resource that goes in more detail? — jjanssen, Jan 19 '16 at 15:30
Set $$\beta_i = -1 - \frac{1}{1 + \exp(-b(x_i-m))}$$ and $$\gamma_i=y_i + \frac{k}{1 + \exp(-b(x_i-m))}.$$ — whuber, Jan 19 '16 at 16:22
I think the simplicity your refer to eludes me. Would you happen to have a link towards a more exhaustive explanation? — jjanssen, Jan 20 '16 at 18:20
https://en.wikipedia.org/wiki/Least_squares#Solving_the_least_squares_problem — whuber, Jan 21 '16 at 00:14
Oh. Does that mean that the partial derivative for any parameter $\beta$ is simply $\frac{\partial}{\partial \beta}J(\theta) = -2 * \sum_{i = 0}^{n}{ \left[(y_i - f(\theta, x_i)) * (\frac{\partial}{\partial \beta}f(\theta, x_i)) \right]}$ ? If so I can see how I was overcomplicating everything. — jjanssen, Jan 21 '16 at 12:48
I have had a hard time finding worked examples on this site--although I have seen many--but I just came across one at http://stats.stackexchange.com/questions/69205. — whuber, Jan 21 '16 at 14:07
I hope I'm not adding more confusion, but the OP was looking for a derivation of the logistic loss function, which, in my (possibly wrong) understanding, is different from the least squares one — Ciprian Tomoiagă, Oct 29 '16 at 18:07

Deriving the gradient of a loss function for generalized logistic regression

0 Answers0