2

Deep Equilibrium Models, by Shaojie Bai, J. Zico Kolter and Vladlen Koltun were proposed to train neural networks with "infinitely-deep", weight-tied, layers for sequence data.

How was this achieved?

Firebug
  • 15,262
  • 5
  • 60
  • 127

1 Answers1

2

I'm not sure what your question is, because the paper is quite straightforward in explaining how this is accomplished. Nonetheless I'll summarize.

A single neural network layer computes $z' = f(x,z; \theta)$, where $x$ is the input, $z$ is the output of the last layer, and $\theta$ are the network weights. A fixed point $z^*$ of this function is one where $z^* = f(x, z^*; \theta)$.

The Banach fixed-point theorem says that if you start with some arbitrary initial $z$ and iteratively call this function (i.e. stack more layers on top), then in the limit you will converge to the fixed point. It's not necessarily true that $f$, our neural network layer, is a contraction mapping (which is a necessary condition of the theorem), but nonetheless empirically, it can be observed that deep neural networks do come close to converging on some fixed point.

So we have established that an infinitely-deep neural network will converge on the fixed point, but since this is computationally expensive, we use Broyden's method (a variant of Newton's method) to find (a good estimate of) the fixed point in finite time. This allows us to conduct forward passes. The author's Theorem 1 provides a way to perform a backward pass in a way which bypasses the need to backpropagate through the solver.

shimao
  • 22,706
  • 2
  • 42
  • 81