So far I've only encountered convolution kernels which are square (ie, have the same rows as columns).
Are there any cases in which a non-square kernel makes sense? If not, why?
So far I've only encountered convolution kernels which are square (ie, have the same rows as columns).
Are there any cases in which a non-square kernel makes sense? If not, why?
Actually the ideal shape would probably be a circle, but that's computationally inconvenient. The point being that you typically have no a priori assumptions of the shapes of the features that your convolutional net should learn. For instance, the lowest layers of a convolutional net trained on images often learn to detect edges. These edges can have any orientation, e.g. vertical, diagonal, horizontal, or something in between. If you examined the weights for a vertical edge detector, you might find that you could actually fit them inside a tall rectangle and crop out some irrelevant (near-zero) weights from the sides of the kernel. Similarly, the horizontal edge detector might fit inside a wide rectangle, not needing the top and bottom bits of the square. But you don't know beforehand which feature will be learnt by which map, so you can't specify these shapes in advance, nor would it probably confer much of an advantage to do so. A circular kernel fits any feature of a given size (e.g. any edge detector with a maximum dimension of 5 pixels can fit inside a circle with a diameter of 5), and the square is the closest approximation to that that is easy to work with computationally.
If you knew in advance that all your features would tend to (for example) be wider than they are tall, then perhaps it might be worth using a (non-square) rectangular kernel.
It makes sense to use a non rectangular kernel if the input device has non rectangular geometry:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8853238
Non rectangular kernel could also reduce the amount of computation:
https://arxiv.org/pdf/1904.08755.pdf
Another example is convolution on point clouds. In the following paper, the kernel shape is no longer rigid, and defined inside a sphere.
An example application of non-square kernel would be something that operates on data that has different size in different dimensions.
For concrete example see this network. It is applies $4 \times 1$ kernels to short-time Fourier transform of sound (it runs on $513 \times 128$ input).
In principle, the kernels may have arbitrary shapes. However, in practice it has been found that the kernel size does not affect the network performance too much; I suppose that is one of the reasons why you don't see irregular-shaped kernels around too much.
Convolution aggregates information from a region of shape corresponding to the shape of the kernel. In computer vision the horizontal and vertical dimension usually make no difference and thus you use square shaped kernel. If it makes sense for a particular task to aggregate information in different sized neighborhood in each dimension, then just go for it. For example, we used kernels of shape [3x3x2] to work with 3D data in this article.