Which layer in a CNN able to detect spinned and translated objects

Question

Conv layer or max pooling layer or anything else does the job? In my opinion, Conv layer or max pooling layer are able to do the job only when the rotations or translations are not too big.

DeltaIV · Answer 1 · 2019-03-15T15:32:02.093

Convolutional layers are not equivariant to rotation, and pooling layers only help with invariance to small rotations. "Invariance" of the whole classifier to rotations is not part of the inductive bias, but it's actually learned through heavy data augmentation.

However, for each group action there exists a corresponding group convolution operator which is equivariant to it. This concept is used, for example, in 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data by Weiler, Geiger, Welling, Boomsma and Cohen, 2018, to design layers which are equivariant to 3D rotations:

https://arxiv.org/pdf/1807.02547.pdf

score 2 · Answer 2 · answered Mar 13 '19 at 06:30

For translational invariance, you can follow the discussion here. In general, pooling layer is the important player in local translational invariance by removing the spatial dimension in, for example, max-pooling. For instance, if an object slightly moves towards some direction, max-pooling still captures the max element and the same output will appear after the pooling. The convolutional layer is actually equivariant in translation.

Neither layers are rotation-invariant. Though, the network can exhibit this behavior if the properties of the data, and the overall architecture permit. A NIPS paper addresses this issue and use Spatial Transformers to improve CNNs invariance to rotation, scale and translation.

now that neither of them reserves rotation invariance, how do modern CNNs detect spinned images by merely using combinations of conv, pooling, etc layers? — feynman, Mar 13 '19 at 09:50
Rotation invariance is not built in to the individual layers, but it doesn't mean CNNs can't learn it. Probably, there is enough data with lots of variety, and enough layers that can make sense of spun objects. — gunes, Mar 13 '19 at 10:07

Which layer in a CNN able to detect spinned and translated objects

2 Answers2