In general, 1x1
convolutions are used to reduce the dimensionality of filter space. I referred this answer.
But we can also reduce the dimensionality of filter space (number of filters) using convolutions other than 1x1
using padding "SAME".
So, What is the difference between those two approaches? One difference I could figure out from the above mentioned question is that applying higher convolutions are computationally expensive and hence first 1x1
convolutions are used to reduce the number of filters and then higher size convolutions are applied. Reference: This answer.
Is there any other significant difference between those two?