Convolution with a non-square kernel

Question

So far I've only encountered convolution kernels which are square (ie, have the same rows as columns).

Are there any cases in which a non-square kernel makes sense? If not, why?

2d convolution is a general operation that one encounters under many circumstances. The answers below focus on convnets for image processing, but what situation are you actually asking about? — user20160, Jun 13 '18 at 08:49
Square kernels fit nicely within matrices. One could see any square convolution with structural zeros as non square. A kernel detecting a cross like shape would be an example. (See the wikipedia kernel examples page.) — spdrnl, Jun 13 '18 at 08:55
@user20160 The answerers guessed correctly. I just added the `image-processing` tag. — Tom Hale, Jun 15 '18 at 08:53
@spdrnl did you mean [this page](https://en.wikipedia.org/wiki/Kernel_(image_processing))? — Tom Hale, Jun 15 '18 at 08:56

score 8 · Accepted Answer · answered Jun 13 '18 at 08:40

Actually the ideal shape would probably be a circle, but that's computationally inconvenient. The point being that you typically have no a priori assumptions of the shapes of the features that your convolutional net should learn. For instance, the lowest layers of a convolutional net trained on images often learn to detect edges. These edges can have any orientation, e.g. vertical, diagonal, horizontal, or something in between. If you examined the weights for a vertical edge detector, you might find that you could actually fit them inside a tall rectangle and crop out some irrelevant (near-zero) weights from the sides of the kernel. Similarly, the horizontal edge detector might fit inside a wide rectangle, not needing the top and bottom bits of the square. But you don't know beforehand which feature will be learnt by which map, so you can't specify these shapes in advance, nor would it probably confer much of an advantage to do so. A circular kernel fits any feature of a given size (e.g. any edge detector with a maximum dimension of 5 pixels can fit inside a circle with a diameter of 5), and the square is the closest approximation to that that is easy to work with computationally.

If you knew in advance that all your features would tend to (for example) be wider than they are tall, then perhaps it might be worth using a (non-square) rectangular kernel.

"the ideal shape would probably be a circle" Do you have a source for this? — JobHunter69, Apr 01 '20 at 02:59
Not really, it's somewhat speculative but stands to reason, which I explain later in my answer. The reasoning is that (1) you don't know the shape of the features you want to learn but (2) you do have a strong prior that these features, at least in lower to intermediate levels of the hierarchy, can appear in the image in any given orientation. If you take any target shape and rotate it about its center point, the smallest window that can contain the target shape in any orientation will be a circular window. — Ruben van Bergen, Apr 03 '20 at 08:58

Taro Kiritani · Answer 2 · 2020-12-04T04:38:00.450

5

It makes sense to use a non rectangular kernel if the input device has non rectangular geometry:

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8853238

Non rectangular kernel could also reduce the amount of computation:

https://arxiv.org/pdf/1904.08755.pdf

Another example is convolution on point clouds. In the following paper, the kernel shape is no longer rigid, and defined inside a sphere.

https://arxiv.org/abs/1904.08889

edited Dec 04 '20 at 04:38

answered Jul 29 '20 at 00:35

Taro Kiritani

151
1
3

Could you please update the link to the first paper or provide its title? I would like to read but the link is broken. – Vladislav Gladkikh Dec 03 '20 at 02:44
I cannot find the paper I cited originally, but this might also provide some useful information: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8853238 Let me edit my answer shortly. – Taro Kiritani Dec 04 '20 at 04:37

score 2 · Answer 3 · answered Jun 15 '18 at 09:54

An example application of non-square kernel would be something that operates on data that has different size in different dimensions.

For concrete example see this network. It is applies $4 \times 1$ kernels to short-time Fourier transform of sound (it runs on $513 \times 128$ input).

score 1 · Answer 4 · answered Jun 13 '18 at 08:31

In principle, the kernels may have arbitrary shapes. However, in practice it has been found that the kernel size does not affect the network performance too much; I suppose that is one of the reasons why you don't see irregular-shaped kernels around too much.

Convolution aggregates information from a region of shape corresponding to the shape of the kernel. In computer vision the horizontal and vertical dimension usually make no difference and thus you use square shaped kernel. If it makes sense for a particular task to aggregate information in different sized neighborhood in each dimension, then just go for it. For example, we used kernels of shape [3x3x2] to work with 3D data in this article.

Convolution with a non-square kernel

4 Answers4