Although these guidelines can be very good kick-starters for many, I don't quite follow them most of the time. Careful experimentation is the key, and this process is highly dependent on the data/problem. Your hidden layer can even have much more neurons than your input layer, instead of being in-between of input/output layers.
For example, consider a 2D case, not even 3D. I don't think using 1-2 hidden neurons will be enough to describe any kind of function, e.g. just to exaggerate: $f(x,y)=x\sin(y+e^{-x})+\cos\log|\sin(\sqrt{|x|}+y^7)|$. You'll probably need more neurons or layers to be able to learn such a function.
For the layers, in the link you provided, it says "one hidden layer is sufficient for large majority of problems". Universal Approximation Theorem actually addresses this. Having one hidden layer could indeed be sufficient, but there isn't a bound on the number of neurons. It can pay the price with exponentially large number of neurons to emulate the behaviour created by additional layers. Some previous research indicate having two layers might be more useful than one layer case for some more complicated tasks.
There are also other guidelines available, saying also that $n_{\text{hidden}}<2n_{\text{input}}$ or $n_{\text{hidden}}=\frac{2}{3}n_{\text{input}}+n_{\text{out}}$, or the one you've said. These are all starting points. In the end, it boils down to some sort of informed trial and error.