38

What is the derivative of the ReLU activation function defined as:

$$ \mathrm{ReLU}(x) = \mathrm{max}(0, x)$$

What about the special case where there is a discontinuity in the function at $x=0$?

Tom Hale
  • 2,231
  • 3
  • 13
  • 31

1 Answers1

47

The derivative is:

$$ f(x)= \begin{cases} 0 & \text{if } x < 0 \\ 1 & \text{if } x > 0 \\ \end{cases} $$

And undefined in $x=0$.

The reason for it being undefined at $x=0$ is that its left- and right derivative are not equal.

Jim
  • 1,912
  • 2
  • 15
  • 20
  • 2
    So in practice (implementation), one just picks either $0$ or $1$ for the $x=0$ case? – Tom Hale Mar 14 '18 at 09:51
  • 6
    The convention is that drdx=1(x>0) – neuroguy123 Mar 14 '18 at 13:10
  • 5
    @TomHale by the way, see Nouroz Rahman's answer at https://www.quora.com/How-do-we-compute-the-gradient-of-a-ReLU-for-backpropagation: _"[...] In my view, in built-in library functions (for example: `tf.nn.relu()`) derivative at x = 0 is taken zero to ensure a sparser matrix..."_ – Jim Mar 29 '18 at 16:17