Modern processors are clocked: Every operation takes some integral number of clock cycles. The designers of the processor determine the length of a clock cycle. There are two considerations there: One, the speed of the hardware, for example measured as the delay of a single NAND-gate. This depends on the technology used, and on tradeoffs like speed vs. power usage. It is independent of the processor design. Two, the designers decide that the length of a clock cycle equals n delays of a single NAND-gate, where n might be 10, or 30, or any other value.
This choice n limits how complex operations can be that can be processed in one cycle. There will be operations that can be done in 16 but not in 15 NAND delays. So chosing n = 16 means such an operation can be done in a cycle, choosing n = 15 means it can't be done.
The designers will chose n so that many important operations can be just about done in one, or maybe two or three cycles. n will be chosen locally optimal: If you replaced n with n-1, then most operations would be a bit faster, but some (those that really need the full n NAND delays) would be slower. If few operations would slow down, so that overall program execution is faster on average, then you would have picked n-1. You could also have picked n+1. That makes most operations a bit slower, but if you have many operations that can't be done within n delays but can be done within n+1 delays then it would make the processor overall faster.
Now your question: Add and subtract are so common operations that you want to be able to execute them in a single cycle. As a result, it doesn't matter that AND, OR etc. could execute faster: They still need that one cycle. Of course the unit "calculating" AND, OR etc has a lot of time to twiddle its thumbs, but that can't be helped.
Note that it's not just whether an operation can be done within n NAND-delays or not: An addition for example can be made faster by being a bit clever, still faster by being very clever, still a bit faster by investing extraordinary amounts of hardware, and at last a processor can have a mixture of very fast very expensive and a bit slower and cheaper circuits, so there is the possiblity to make one operation just about fast enough by spending more money on it.
Now you could make the clock speed so high / the cycle so short that only the simple bit operations execute in one cycle and everything else in two or more. That would most likely slow the processor down. For operations that take two cycles, there is usually overhead to move an incomplete instruction from one cycle to the next, so two cycles doesn't mean you have twice as much time for execution. So to do the addition in two cycles, you couldn't double the clock speed.
16https://en.wikipedia.org/wiki/Carry-lookahead_adder also https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Comb/lookahead.html – slebetman – 2017-05-24T05:22:26.530
fwiw, multiplication isn't much slower
– Octopus – 2017-05-27T20:29:08.9701Yep, multiplication was pretty fast in my tests too. It was only around 2x slower than addition, while division was around 30x (!) times slower. – Teodor Dyakov – 2017-05-27T20:45:35.870
Compact overview of state-of-the-art parallel prefix tree adders: A Taxonomy of Parallel Prefix Networks by David Harris: http://pages.hmc.edu/harris/research/taxonomy.pdf
– Franki – 2018-07-18T12:47:33.807More elaborated: PhD Jun Chen's doctoral thesis "Parallel-prefix structures for binary and modulo {2n−1, 2n, 2n+1} adders" http://digital.library.okstate.edu/etd/Chen_okstate_0664D_10070.pdf
– Franki – 2018-07-18T12:51:49.907