When was the ReLU function first used in a neural network?
By ReLU, I mean the function $$ f(x) = \max\{0, x\}. $$
By neural network, I mean function approximation machines which are comprised of one or more "hidden layers."
(That is, I wish to exclude models which are viewed as "special cases" of neural networks because if we admitted such special cases, then the question would reduce to something along the lines of "when did anyone, in any context, first propose the idea of thresholding values below 0?" which is not really interesting to me.)