3

In Pattern Recognition and Machine Learning Ch 5.5.5 Bishop derives a regulariser for neural networks that is equivalent to the tangent propagation regulariser (a regulariser that is invariant to certain types of transformations). The regularisation term is given by 5.134:

$$\Omega=\frac{1}{2}\int(\tau^T\nabla y(x))^2p(x)dx$$

where $$\tau=\frac{\partial s(x,\xi)}{\partial \xi }$$

Then the author states that for a transformation that consists of addition of random noise i.e.

$$s(x,\xi)=x+\xi$$

the regulariser reduces to Tikhonov regularization and has the form:

$$\Omega=\frac{1}{2}\int||\nabla y(x)||^2p(x)dx$$

I'm trying to figure out how to evaluate the derivative $\tau$ when $s(x,\xi)=x+\xi$.

I think $x$ is a vector and so $\xi$ must also be a vector in which case I believe the derivative should be equal to the identity and you get $$(\tau^T\nabla y(x))^2=(I^T\nabla y(x))^2=(I\nabla y(x))^2=(\nabla y(x))^2 \neq ||\nabla y(x)||^2$$.

EDIT: Okay so I think I solved it. I posted the answer below, but basically: $(\tau^T \nabla y(x))^2=(\tau^T \nabla y(x))(\tau^T \nabla y(x))=(\nabla y(x)^T \tau)(\tau^T \nabla y(x))=(\nabla y(x)^T (\tau \tau^T) \nabla y(x))=(\nabla y(x)^T (I I^T) \nabla y(x))=\nabla y(x)^T \nabla y(x) = ||\nabla y(x)||^2$

tail_recursion
  • 249
  • 1
  • 7

1 Answers1

1

We have

$$\tau=\frac{\partial s(x,\xi)}{\partial \xi}=\frac{\partial (x+\xi)}{\partial \xi}=I$$

then

$$(\tau^T \nabla y(x))^2=(\tau^T \nabla y(x))(\tau^T \nabla y(x))=(\nabla y(x)^T \tau)(\tau^T \nabla y(x))=(\nabla y(x)^T (\tau \tau^T) \nabla y(x))=(\nabla y(x)^T (I I^T) \nabla y(x))=\nabla y(x)^T \nabla y(x) = ||\nabla y(x)||^2$$

tail_recursion
  • 249
  • 1
  • 7