Questions tagged [derivative]

For on-topic questions involving the mathematical concept of a derivative, i.e. $\frac{d}{dx} f(x)$. For purely mathematical questions about the derivative it is better to ask on math SE https://math.stackexchange.com/

245 questions
60
votes
5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…
micha
  • 703
  • 1
  • 6
  • 5
40
votes
1 answer

Step-by-step example of reverse-mode automatic differentiation

Not sure if this question belongs here, but it's closely related to gradient methods in optimization, which seems to be on-topic here. Anyway, feel free to migrate if you think some other community has better expertise in the topic. In short, I'm…
ffriend
  • 9,380
  • 5
  • 24
  • 29
38
votes
5 answers

How is the cost function from Logistic Regression differentiated

I am doing the Machine Learning Stanford course on Coursera. In the chapter on Logistic Regression, the cost function is this: Then, it is differentiated here: I tried getting the derivative of the cost function, but I got something completely…
octavian
  • 909
  • 2
  • 11
  • 18
16
votes
2 answers

Derivative of a Gaussian Process

I believe that the derivative of a Gaussian process (GP) is a another GP, and so I would like to know if there are closed form equations for the prediction equations of the derivative of a GP? In particular, I am using the squared exponential (also…
user30490
15
votes
3 answers

How can I fit a spline to data that contains values and 1st/2nd derivatives?

I have a dataset that contains, let's say, some measurements for position, speed and acceleration. All come from the same "run". I could construct a linear system and fit a polynomial to all of those measurements. But can I do the same with splines?…
dani
  • 203
  • 1
  • 8
14
votes
3 answers

Proper regression for determining correlations between derivatives of functions

Say we have a permanent-magnet DC motor that roughly obeys the system equation $\ddot{x}(t) = \alpha \dot{x}(t) + \beta u(t) + \gamma$, where $x(t)$ is the displacement of the rotor, and $u(t)$ the applied voltage, at time $t$. Say we wish to…
user3716267
  • 614
  • 3
  • 11
13
votes
1 answer

What justifies this calculation of the derivative of a matrix function?

In Andrew Ng's machine learning course, he uses this formula: $\nabla_A tr(ABA^TC) = CAB + C^TAB^T$ and he does a quick proof which is shown below: $\nabla_A tr(ABA^TC) \\ = \nabla_A tr(f(A)A^TC) \\ = \nabla_{\circ} tr(f(\circ)A^TC) +…
MoneyBall
  • 737
  • 4
  • 15
13
votes
3 answers

Can a neural network learn a functional, and its functional derivative?

I understand that neural networks (NNs) can be considered universal approximators to both functions and their derivatives, under certain assumptions (on both the network and the function to approximate). In fact, I have done a number of tests on…
Michael
  • 131
  • 1
  • 5
12
votes
1 answer

Second order approximation of the loss function (Deep learning book, 7.33)

In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247). Quadratic approximation of cost function $j$ is given…
stevew
  • 749
  • 3
  • 12
12
votes
1 answer

Interpretation of Radon-Nikodym derivative between probability measures?

I have seen at some points the use of the Radon-Nikodym derivative of one probability measure with respect to another, most notably in the Kullback-Leibler divergence, where it is the derivative of the probability measure of a model for some…
10
votes
4 answers

Computing gradients via Gaussian Process Regression

I have a set of noisy data that I am fitting using Gaussian Process Regression via Python's sklearn package. The posterior mean of the GP is essentially my output with an associated error. Based on either the posterior mean or the original data…
10
votes
1 answer

Differentiation of Cross Entropy

I have been trying to create a program for training Neural Networks on my computer. For the Network in question, I have decided to use the Cross Entropy Error function: $$E = -\sum_jt_j\ln o_j$$ Where $t_j$ is the target output for the Neuron $j$,…
10
votes
1 answer

How to compute the gradient and hessian of logarithmic loss? (question is based on a numpy example script from xgboost's github repository)

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script. I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the…
Greg
  • 335
  • 1
  • 4
  • 9
10
votes
4 answers

How to find derivative of softmax function for the purpose of gradient descent?

I'm trying to understand back propagation algorithm for multiclass classification using gradient descent. I'm using https://www.cs.toronto.edu/~graves/phd.pdf . The output layer is a softmax layer, in which each unit in that layer has activation…
She
  • 211
  • 1
  • 2
  • 7
10
votes
2 answers

Equation of a fitted smooth spline and its analytical derivative

I need to fit a spline function to a data set. I tried with bs, ns and smooth.spline. In my case the curve obtained by smooth.spline follows the trend in data better than with the bs and ns. However, I do not know how to obtain the equation of the…
Francesco
  • 103
  • 1
  • 1
  • 5
1
2 3
16 17