Which function can approximated with Neural Networks using only linear activation functions?

Question

I want to find out which functions can be approximated up to arbitrary accuracy using Neural Networks with only linear activations. On this page I found out that with linear activation functions, the error in the prediction stays constant and is not depending on $x$. This means probably it has a bad accuracy, but I have not found out, if there is something like a URT for linear functions.

What I don't understand yet, how to relate the activation function to the actual model I want to predict.

My idea is that with only linear activation functions you can approximate only linear functions up to arbitrary accuracy. I would argue that the cost function:

\begin{align*} C(\mathbf{X}, \mathbf{y}, \mathbf{w})&=\frac{1}{N} \sum_{i=1}^{N}\left(y_{i}-\hat{y}\left(\mathbf{x}_{i} ; \mathbf{w}\right)\right)^{2}\\ &= \frac{1}{N} \sum_{i=1}^{N}\left(y_{i}-\mathbf{w}\cdot \mathbf{x}_i - b_i\right)^{2} \end{align*}

contains only a sum of linear functions, which cannot approximate a quadratic function for example. Maybe someone can help me writing that in a more rigorous way.

With linear activation function your output will always be linear because inputs are repeatedly multiplied with weights and added. Has nothing to do with loss function. — rapaio, Jun 24 '19 at 16:52
This is true, but don't answers my question, which functions can be approximated and how one can show this (maybe I didn't made that point clear). — Leviathan, Jun 24 '19 at 17:16
yes in general you can only approximate linear functions this way -- but an NN can learn to exploit floating point quantization errors and learn nonlinear functions even with only "linear" activations! -- although i suspect this isn't what you're looking for — shimao, Jun 24 '19 at 21:11
https://stats.stackexchange.com/questions/325776/does-the-universal-approximation-theorem-for-neural-networks-hold-for-any-activa — Skander H., Jun 25 '19 at 02:22
@shimao ok that's interesting. Have you got a script/paper where I find a simple, but rigorous proof for that statement? — Leviathan, Jun 25 '19 at 16:20

Sycorax · Accepted Answer · 2021-01-06T04:38:49.933

The composition of one or more linear functions is itself a linear function. A neural network using only linear activations can be rewritten as a linear function. What is the purpose of a neural network activation function?

Using a linear function, you'll be able to approximate functions which are single, straight lines, i.e. linear functions. Approximating a nonlinear function with a linear function will have some amount of error -- perhaps small enough to be ignored, perhaps too large to be acceptable. This judgement depends on what you need from your model.

Which function can approximated with Neural Networks using only linear activation functions?

1 Answers1

Linked