What are the practical uses of Neural ODEs?

Question

"Neural Ordinary Differential Equations", by Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt and David Duvenaud, was awarded the best-paper award in NeurIPS in 2018

There, authors propose the NeuralODE, which is a method that fuses concepts of Differential Equations and Neural Networks. Borrowing from previous literature and contributing newer developments, NeuralODEs can, for example, use Differential Equation Solvers in their forward pass and still maintain a computable backward pass.

Two examples in the paper really caught the attention in the media I think, which were on Flow-based generative modelling and the pseudo-equivalence between the ResNet skip-connection depth and the NeuralODE number of function evaluations (see more here: What is the continuous depth concept in Neural ODE paper by David Duvenaud?).

So, for brevity's sake, two questions need to be made pertaining to the way they are to be actually used (which I'm surprised we did not have in CV yet):

Is there something NeuralODEs do that "conventional" Neural Networks cannot? Continuous time computations? "Infinite" "depth" computation?
Is there something "conventional" Neural Networks do that NeuralODEs cannot do?

David Duvenaud · Accepted Answer · 2020-04-05T16:47:40.973

TL;DR: For time series and density modeling, neural ODEs offer some benefits that we don't know how to get otherwise. For plain supervised learning, there are potential computational benefits, but for practical purposes they probably aren't worth using yet in that setting.

To answer your first question:

Is there something NeuralODEs do that "conventional" Neural Networks cannot?

Neural ODEs differ in two ways from standard nets:

They represent a different set of functions, which can be good or bad depending on what you're modeling.
We have to approximate their exact solution, which gives more freedom in how to compute the answer, but adds complexity.

I'd say the clearest setting where neural ODEs help is building continuous-time time series models, which can easily handle data coming at irregular intervals. However, ODEs can only model deterministic dynamics, so I'm more excited by generalization of these time-series models to stochastic differential equations. If you're modeling data sampled at regular time intervals (like video or audio), I think there's not much advantage, and standard approaches will probably be simpler and faster.

Another setting where they have an advantage is in building normalizing flows for density modeling. The bottleneck in normalizing flows is keeping track of the change in density, which is slow (O(D^3)) for standard nets. That's why discrete-time normalizing flow models like Glow or Real-NVP have to restrict the architectures of their layers, for example only updating half the units depending on the other half. In continuous time, it's easier to track the change in density, even for unrestricted architectures. That's what the FFJORD paper is about. Since then, Residual Flows were developed, which are discrete time flows that can also handle unrestricted architectures, with some caveats.

For standard deep learning, there are two potential big advantages:

Constant memory cost at training time. Before neural ODEs there was already some work showing we can reduce the memory cost of computing reverse-mode gradients of neural networks if we could 'run them backwards' from the output, but this required restricting the architecture of the network. The nice thing about neural ODEs that you you can simply run their dynamics backwards to reconstruct the original trajectory. In both cases, compounding numerical error could be a problem in some cases, but we didn't find this to be a practical concern.
Adaptive time cost. The idea is that since we're only approximating an exact answer, sometimes we might only need a few iterations of our approximate solver to get an acceptably good answer, and so could save time.

Both of these potential advantages are shared by Deep Equilibrium Models, and they've already been scaled up to transformers. But in both cases, these models so far in practice have tended to be slower overall than standard nets, because we don't yet know how to regularize these models to be easy to approximate.

To answer your second question:

Is there something "conventional" Neural Networks do that NeuralODEs cannot do?

Conventional nets can fit non-homeomorphic functions, for example functions whose output has a smaller dimension that their input, or that change the topology of the input space. There was a nice paper from Oxford pointing out these issues, and showing that you can also fix it by adding extra dimensions. Of course, you could handle this by composing ODE nets with standard network layers.
Conventional nets can be evaluated exactly with a fixed amount of computation, and are typically faster to train. Plus, with standard nets you don't have to choose an error tolerance for a solver.

Hi David, thank you very much for dropping by to answer the question! I thought we were missing some questions here on the site pertaining to the newest developments, so I went ahead and asked it myself. I kind of partially knew the answer to the first question (but you answer really touched on things I did not), but I was really curious about the second one, so, thank you! — Firebug, Jan 22 '20 at 20:27
Thanks for the nice post. "Of course, you could handle this by composing ODE nets with standard network layers." LOL, wouldn't that make is a normal neural net like resnet? — sfotiadis, Apr 03 '20 at 20:40
sfotiadis: To some extent, yes, but the idea is that we could replace many resnet layers with a single ODEnet. The point is just that you can mix-and-match if you want to. — David Duvenaud, Apr 05 '20 at 16:49

What are the practical uses of Neural ODEs?

1 Answers1

Linked