For on-topic questions involving the mathematical concept of a derivative, i.e. $\frac{d}{dx} f(x)$. For purely mathematical questions about the derivative it is better to ask on math SE https://math.stackexchange.com/
Questions tagged [derivative]
245 questions
60
votes
5 answers
Backpropagation with Softmax / Cross Entropy
I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer.
The cross entropy error function is
$$E(t,o)=-\sum_j t_j \log o_j$$
with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

micha
- 703
- 1
- 6
- 5
40
votes
1 answer
Step-by-step example of reverse-mode automatic differentiation
Not sure if this question belongs here, but it's closely related to gradient methods in optimization, which seems to be on-topic here. Anyway, feel free to migrate if you think some other community has better expertise in the topic.
In short, I'm…

ffriend
- 9,380
- 5
- 24
- 29
38
votes
5 answers
How is the cost function from Logistic Regression differentiated
I am doing the Machine Learning Stanford course on Coursera.
In the chapter on Logistic Regression, the cost function is this:
Then, it is differentiated here:
I tried getting the derivative of the cost function, but I got something completely…

octavian
- 909
- 2
- 11
- 18
16
votes
2 answers
Derivative of a Gaussian Process
I believe that the derivative of a Gaussian process (GP) is a another GP, and so I would like to know if there are closed form equations for the prediction equations of the derivative of a GP? In particular, I am using the squared exponential (also…
user30490
15
votes
3 answers
How can I fit a spline to data that contains values and 1st/2nd derivatives?
I have a dataset that contains, let's say, some measurements for position, speed and acceleration. All come from the same "run". I could construct a linear system and fit a polynomial to all of those measurements.
But can I do the same with splines?…

dani
- 203
- 1
- 8
14
votes
3 answers
Proper regression for determining correlations between derivatives of functions
Say we have a permanent-magnet DC motor that roughly obeys the system equation $\ddot{x}(t) = \alpha \dot{x}(t) + \beta u(t) + \gamma$, where $x(t)$ is the displacement of the rotor, and $u(t)$ the applied voltage, at time $t$.
Say we wish to…

user3716267
- 614
- 3
- 11
13
votes
1 answer
What justifies this calculation of the derivative of a matrix function?
In Andrew Ng's machine learning course, he uses this formula:
$\nabla_A tr(ABA^TC) = CAB + C^TAB^T$
and he does a quick proof which is shown below:
$\nabla_A tr(ABA^TC) \\
= \nabla_A tr(f(A)A^TC) \\
= \nabla_{\circ} tr(f(\circ)A^TC) +…

MoneyBall
- 737
- 4
- 15
13
votes
3 answers
Can a neural network learn a functional, and its functional derivative?
I understand that neural networks (NNs) can be considered universal approximators to both functions and their derivatives, under certain assumptions (on both the network and the function to approximate). In fact, I have done a number of tests on…

Michael
- 131
- 1
- 5
12
votes
1 answer
Second order approximation of the loss function (Deep learning book, 7.33)
In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).
Quadratic approximation of cost function $j$ is given…

stevew
- 749
- 3
- 12
12
votes
1 answer
Interpretation of Radon-Nikodym derivative between probability measures?
I have seen at some points the use of the Radon-Nikodym derivative of one probability measure with respect to another, most notably in the Kullback-Leibler divergence, where it is the derivative of the probability measure of a model for some…

user56834
- 2,157
- 13
- 35
10
votes
4 answers
Computing gradients via Gaussian Process Regression
I have a set of noisy data that I am fitting using Gaussian Process Regression via Python's sklearn package. The posterior mean of the GP is essentially my output with an associated error. Based on either the posterior mean or the original data…

Mathews24
- 417
- 4
- 20
10
votes
1 answer
Differentiation of Cross Entropy
I have been trying to create a program for training Neural Networks on my computer. For the Network in question, I have decided to use the Cross Entropy Error function:
$$E = -\sum_jt_j\ln o_j$$
Where $t_j$ is the target output for the Neuron $j$,…

Geno Racklin Asher
- 203
- 1
- 2
- 6
10
votes
1 answer
How to compute the gradient and hessian of logarithmic loss? (question is based on a numpy example script from xgboost's github repository)
I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.
I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the…

Greg
- 335
- 1
- 4
- 9
10
votes
4 answers
How to find derivative of softmax function for the purpose of gradient descent?
I'm trying to understand back propagation algorithm for multiclass classification using gradient descent. I'm using https://www.cs.toronto.edu/~graves/phd.pdf . The output layer is a softmax layer, in which each unit in that layer has activation…

She
- 211
- 1
- 2
- 7
10
votes
2 answers
Equation of a fitted smooth spline and its analytical derivative
I need to fit a spline function to a data set. I tried with bs, ns and smooth.spline. In my case the curve obtained by smooth.spline follows the trend in data better than with the bs and ns.
However, I do not know how to
obtain the equation of the…

Francesco
- 103
- 1
- 1
- 5