In Pattern Recognition and Machine Learning Ch 5.5.5 Bishop derives a regulariser for neural networks that is equivalent to the tangent propagation regulariser (a regulariser that is invariant to certain types of transformations). The regularisation term is given by 5.134:
$$\Omega=\frac{1}{2}\int(\tau^T\nabla y(x))^2p(x)dx$$
where $$\tau=\frac{\partial s(x,\xi)}{\partial \xi }$$
Then the author states that for a transformation that consists of addition of random noise i.e.
$$s(x,\xi)=x+\xi$$
the regulariser reduces to Tikhonov regularization and has the form:
$$\Omega=\frac{1}{2}\int||\nabla y(x)||^2p(x)dx$$
I'm trying to figure out how to evaluate the derivative $\tau$ when $s(x,\xi)=x+\xi$.
I think $x$ is a vector and so $\xi$ must also be a vector in which case I believe the derivative should be equal to the identity and you get $$(\tau^T\nabla y(x))^2=(I^T\nabla y(x))^2=(I\nabla y(x))^2=(\nabla y(x))^2 \neq ||\nabla y(x)||^2$$.
EDIT: Okay so I think I solved it. I posted the answer below, but basically: $(\tau^T \nabla y(x))^2=(\tau^T \nabla y(x))(\tau^T \nabla y(x))=(\nabla y(x)^T \tau)(\tau^T \nabla y(x))=(\nabla y(x)^T (\tau \tau^T) \nabla y(x))=(\nabla y(x)^T (I I^T) \nabla y(x))=\nabla y(x)^T \nabla y(x) = ||\nabla y(x)||^2$