There is a statement in this quora answer:
Layer depth is usually a power of 2 because it is convenient for the GPU.
Also, in fully connected layers number of neurons in every hidden layer corresponds to a power of 2.
But, why is the power of 2 convenient for GPU? If I have everything for the power of 3, why would it be inconvenient for GPU? I thought one would use GPU in deep learning only because one can parallelize batches/or convolution computation.