3

While I've read "Variational Inference with Normalizing Flows" (abstract), I don't understand about an intuition of Planar Flow.

The author defined Planar Flow as below

Let $\boldsymbol{w} \in \mathbb{R}^D, \boldsymbol{u} \in \mathbb{R}^D, > b \in \mathbb{R}$ and $h(\cdot)$ be a smooth element-wise non-linearity.
Then the following formula is Planar Flow

$$\begin{array}{c} f(\mathbf{z}) = \mathbf{z} + \mathbf{u}h(\mathbf{w}^T\mathbf{z}+b) \\ \psi(\mathbf{z})=h^{\prime}\left(\mathbf{w}^{\top} \mathbf{z}+b\right) \mathbf{w} \\ |\operatorname{det} \frac{\partial f}{\partial \mathbf{z}}|=| \operatorname{det}\left(\mathbf{I}+\mathbf{u} \psi(\mathbf{z})^{\top}\right)|=| 1+\mathbf{u}^{\top} \psi(\mathbf{z}) | \quad (1)\end{array}$$

The author said that

The flow defined by the transformation (1) modified the initial density $q_0$ by applying a series of contractions and expansions in the direction perpendicular to the hyperplane $\mathbf{w}^T\mathbf{z}+b=0$.

I couldn't understand that why the transformation (1) move the vector $\mathbf{z}$ along the direction perpendicular to the hyperplane $\mathbf{w}^T\mathbf{z}+b=0$.

Would anybody elaborate this?

Glorfindel
  • 700
  • 1
  • 9
  • 18
alryosha
  • 199
  • 6

2 Answers2

2

For every $z,$ notice that the displacement from $z$ to its destination $f(z),$ given by $f(z)-z,$ is a multiple of the fixed vector $u.$ Thus, if you were to diagram the effect of $f$ by drawing arrows from a selected set of original values $z_i$ to their destinations $f(z_i),$ all the arrows would be parallel. See the right hand plot in the figure below.

Next, notice that each level set of $f$ is a union of level sets of the function

$$z \to w^\top z,$$

which are parallel hyperplanes. On any such hyperplane given by $w^\top z = c,$ for some constant real number $c,$ all the arrows equal

$$f(z) - z = u\,h(w^\top z + b) = u\, h(c + b).$$

That shows they all have common length $|h(c+b)|\,||u||$ for every $z$ on that hyperplane.

Why one might call these characteristics "planar" is inscrutable.


Here at the left is an example of a generic $f:\mathbb{R}^2\to\mathbb{R}^2$ from Analysis with complex data, anything different?:

Figure

On the right is a "planar flow" transformation. The arrows are colored according to the value of $h.$ The common direction of displacement is $u = (2,-1)$ and the amount of displacement varies in the direction $w = (10,-1).$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
1

The equation $$ \mathbf{w}^T\mathbf{z_1}+b=0 $$ defines a (hyper)plane. The vector $\mathbf{w}$ is the normal vector. For a refresher on multivariable calculus, see here.

So what happens if you have a fixed vector $\mathbf{w}$, a fixed scalar $b$, and you plug in a different point $\mathbf{z_2}$ into the above equation, and get

$$ \mathbf{w}^T\mathbf{z_z}+b= 1? $$ $1$ isn't $0$, so obviously this new point $\mathbf{z}_2$ isn't in the same plane. But what does $1$ represent?

$\mathbf{z}_2$ is in a different plane. This new plane has the same normal vector, $\mathbf{w}$, so this new plane is parallel to the old one. It's just shifted.

The function $\mathbf{w}^T\mathbf{z}+b$ operates on all of $\mathbb{R}^d$, so you can plug in any vector $\mathbf{z}$. The output represents perpendicular distance from some prototypical plane.

Then $h$, the "smooth element-wise non-linearity" will take this scalar output and map it into another, less interpretable scalar.

Then that scalar gets multiplied to a vector. This product is added to the original input $\mathbf{z} \in \mathbb{R}^d$. If the original point was on the plane, and if $h$ maps $0$ to $0$, then nothing gets added to the original vector $\mathbf{z}$.

On the other hand, if $\mathbf{z}$ was far away from the original plane, then a significant amount $\mathbf{u}h(\mathbf{w}^T\mathbf{z}+b)$ gets added to the original input vector.

Taylor
  • 18,278
  • 2
  • 31
  • 66