In Pytorch and similar machine learning software, the Autograd module computes the gradient of a function without needing to explicit declare the derivative of each single function which composes the main function. However, it is possibile to explicit declare the gradient of one (or more) single functions (e.g., in Pytorch is possibile overloading the backward()
function). I don't understand when the explicit backward definition is mandatory and if there is a set of rules which have to be respected to be sure that the explicit derivative definition is not needed to compute the gradient correctly.
automatic diffentiation (autograd): when the explicit definition of the gradient function is needed?

- 625
- 5
- 9
1 Answers
Most machine learning libraries which suport automatic differentiation overload or provide alternatives for many common functions (basic arithmetic, matrix operations, common neural network layers, etc). The derivatives of these common functions have already been implemented, so as long as you stick to only using these functions (and you can get surprisingly far!), everything will work as expected.
when the explicit backward definition is mandatory
If you make use of a function whose backward pass is not implemented, then you would have to define your gradient. Most commonly, this happens when you want to implement something which cannot be easily or efficiently expressed in terms of the supported functions. For example, maybe you want to pass some data into an ODE solver, and then use the outputs in your neural network, and later backpropagate all the way to through the solver. You would likely be using a third-party solver, which is unlikely to be supported by pytorch.

- 22,706
- 2
- 42
- 81