Why are rectified linear units considered non-linear?

Question

Why are activation functions of rectified linear units (ReLU) considered non-linear?

$$ f(x) = \max(0,x)$$

They are linear when the input is positive and from my understanding to unlock the representative power of deep networks non-linear activations are a must, otherwise the whole network could be represented by a single layer.

There's a similar question asked before: https://stats.stackexchange.com/questions/275358/why-is-increasing-the-non-linearity-of-neural-networks-desired though it's probably not a duplicate — Aksakal, Mar 21 '18 at 19:19

score 40 · Accepted Answer · answered Mar 16 '15 at 17:22

40

RELUs are nonlinearities. To help your intuition, consider a very simple network with 1 input unit $x$, 2 hidden units $y_i$, and 1 output unit $z$. With this simple network we could implement an absolute value function,

$$z = \max(0, x) + \max(0, -x),$$

or something that looks similar to the commonly used sigmoid function,

$$z = \max(0, x + 1) - \max(0, x - 1).$$

By combining these into larger networks/using more hidden units, we can approximate arbitrary functions.

$\hskip2in$ RELU network function

answered Mar 16 '15 at 17:22

Lucas

5,692
29
39

Would these types of hand-constructed ReLus be built apriori and hard coded in as layers? If so, how would you know that your network required one of these specially built ReLus in particular? – Monica Heddneck Sep 16 '16 at 07:53
5

@MonicaHeddneck You could specify your own non-linearities, yes. What makes one activation function better than another is a constant research topic. For example, we used to use sigmoids, $\sigma(x) = \frac{1}{1 + e^{-x}}$, but then due to the vanishing gradient problem, ReLUs became more popular. So it's up to you to use different non-linearity activation functions. – Tarin Ziyaee Sep 19 '16 at 21:02
1

How would you approximate $e^x$ with ReLU in out of sample? – Aksakal Sep 12 '18 at 21:42
1

@Lucas, So basically if combine(+) >1 ReLUs we can approximate any function, but if we simply `reLu(reLu(....))` it will be linear always? Also, here you change `x` to `x+1`, that could be thought as `Z=Wx+b` where W & b changes to give different variants of such kind `x` & `x+1`? – Anu Mar 31 '19 at 00:12

Why are rectified linear units considered non-linear?

1 Answers1

Linked

Related