3

The paper FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models presents a continuous-time flow as a generative model which uses Hutchinson's trace estimator to give an unbiased estimate of the log-density, allowing for unrestricted network architectures at the cost of approximate Jacobian determinant computation.

They define the model by the following ODE: $$ \frac{\partial \mathbf{z}(t)}{\partial t} = f(\mathbf{z}(t), t; \theta) $$ and show that the total change in log-density can be computed by integrating across time: $$ \log p(\mathbf{z}(t_1)) = \log p(\mathbf{z}(t_0)) - \int_{t_0}^{t_1} \mathrm{Tr}\left(\frac{\partial f}{\partial \mathbf{z}(t)} \right) \mathop{dt} $$

On page 4, the authors argue that vector-Jacobian products $ v^T \frac{\partial f}{\partial \mathbf{z}}$ can be computed for the same cost as evaluating $f$ using reverse-mode automatic differentiation. Can someone explain to me how this is true?

Lashoun
  • 31
  • 2

1 Answers1

2

This is a well known result from automatic differentiation literature. Specifically, the result is that reverse mode differentiation can

  1. calculate the gradient $\frac{\partial \hat{f}} {\partial \theta}$ of a scalar function $\hat{f}$, with a time complexity equal to a constant multiple of the time to calculate $\hat{f}$.
  2. calculate the vector jacobian product $v^T \frac{\partial f}{\partial z}$, for some constant vector $v$, and vector valued function $f$, with a time complexity equal to a constant multiple of the time to calculate $f$.

Hence the claim is just a result of 2). Notably, calculating the vector-jacobian product is not the same amount of time to calculate $f$, but it's bounded by a constant of 4.

The best reference I can give for this is "Evaluating Derivatives: principles and techniques of algorithmic differentiation" by Griewank and Walther (see chapter 3.2)

Taw
  • 163
  • 5