1

I want to find out which functions can be approximated up to arbitrary accuracy using Neural Networks with only linear activations. On this page I found out that with linear activation functions, the error in the prediction stays constant and is not depending on $x$. This means probably it has a bad accuracy, but I have not found out, if there is something like a URT for linear functions.

What I don't understand yet, how to relate the activation function to the actual model I want to predict.

My idea is that with only linear activation functions you can approximate only linear functions up to arbitrary accuracy. I would argue that the cost function:

\begin{align*} C(\mathbf{X}, \mathbf{y}, \mathbf{w})&=\frac{1}{N} \sum_{i=1}^{N}\left(y_{i}-\hat{y}\left(\mathbf{x}_{i} ; \mathbf{w}\right)\right)^{2}\\ &= \frac{1}{N} \sum_{i=1}^{N}\left(y_{i}-\mathbf{w}\cdot \mathbf{x}_i - b_i\right)^{2} \end{align*}

contains only a sum of linear functions, which cannot approximate a quadratic function for example. Maybe someone can help me writing that in a more rigorous way.

Leviathan
  • 113
  • 3
  • With linear activation function your output will always be linear because inputs are repeatedly multiplied with weights and added. Has nothing to do with loss function. – rapaio Jun 24 '19 at 16:52
  • This is true, but don't answers my question, which functions can be approximated and how one can show this (maybe I didn't made that point clear). – Leviathan Jun 24 '19 at 17:16
  • yes in general you can only approximate linear functions this way -- but an NN can learn to exploit floating point quantization errors and learn nonlinear functions even with only "linear" activations! -- although i suspect this isn't what you're looking for – shimao Jun 24 '19 at 21:11
  • https://stats.stackexchange.com/questions/325776/does-the-universal-approximation-theorem-for-neural-networks-hold-for-any-activa – Skander H. Jun 25 '19 at 02:22
  • @shimao ok that's interesting. Have you got a script/paper where I find a simple, but rigorous proof for that statement? – Leviathan Jun 25 '19 at 16:20

1 Answers1

1

The composition of one or more linear functions is itself a linear function. A neural network using only linear activations can be rewritten as a linear function. What is the purpose of a neural network activation function?

Using a linear function, you'll be able to approximate functions which are single, straight lines, i.e. linear functions. Approximating a nonlinear function with a linear function will have some amount of error -- perhaps small enough to be ignored, perhaps too large to be acceptable. This judgement depends on what you need from your model.

Sycorax
  • 76,417
  • 20
  • 189
  • 313