Universal Approximation Theorem — Neural Networks

Question

I have posted this question elsewhere--MSE-Meta, MSE, TCS, MetaOptimize. Previously, no one had given a solution. But now, here is a really excellent and comprehensive answer.

Universal approximation theorem states that "the standard multilayer feed-forward network with a single hidden layer, which contains finite number of hidden neurons, is a universal approximator among continuous functions on compact subsets of Rn, under mild assumptions on the activation function."

I understand what this means, but the relevant papers are too far over my level of math understanding to grasp why it is true or how a hidden layer approximates non-linear functions.

So, in terms little more advanced than basic calculus and linear algebra, how does a feed-forward network with one hidden layer approximate continuous, non-linear functions? The answer need not necessarily be totally concrete.

Where exactly is the problem? I find the formulation on Wikipedia as clear as it can get http://en.m.wikipedia.org/wiki/Universal_approximation_theorem (the proof is more involved). Perhaps it helps if you state specifically where the difficulties lie. — Momo, May 12 '13 at 20:12
@Momo I understand what is true (the theorem), but not why it is true (the proof). More importantly, though, I have no intuition for how some arbitrary function will be approximated by a weighted superposition of hidden neurons with bounded activations. I.e, if I showed you some multivariate function, or the shape of a manifold, would you have some sense of how many hidden units would be needed to approximate it (to some accuracy), and what the role of each unit would be. That is what I am after. — Matt Munson, May 12 '13 at 20:32
That's where it becomes tricky. The whole idea goes back to a theorem by Kolmogorov on densness on a topological space but one needs functional analysis for the cybenko-hornik version of the proof (as far as I can remember). However, they latter question about how many neurons are needed: I don't think you can derive the number of neurons from the theorem, only that it is a finite number. — Momo, May 12 '13 at 21:00
Perhaps the following intuition might help: Similar as with splines, you might think to approximate a function by concatenating many local functions that are bounded and monotonic and then smoothing this series. The function (e.g. a sigmoid) can be made differently steep by the linear function in the weights and x. Think of a polynomial of order 2 and how to approximate it with two S shaped functions where one is the mirrored version of the other. Does this help? — Momo, May 12 '13 at 21:18
@Momo Yes, that helps and is exactly the sort of thing I am looking for. Could we borrow the math of spline interpolation in order to reason about how many neurons we need and what their roles will be? Or is there some other way that we can leverage this insight? — Matt Munson, May 12 '13 at 21:47
I don't think that would help.The splines comparison was just for trying to illustrate. But a similar question with respect to number of neurons has been asked here http://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw — Momo, May 12 '13 at 22:03
@Momo thanks. I will read that question and respond tomorrow (I must go now). — Matt Munson, May 12 '13 at 22:08
This interactive book chapter might help future readers: http://neuralnetworksanddeeplearning.com/chap4.html — Lyndon White, Jul 20 '17 at 07:06
I'm voting to close this question as off-topic because it is cross-posted w/ an answer [here](https://cstheory.stackexchange.com/a/17630/). — gung - Reinstate Monica, Nov 05 '17 at 12:26
@Pinnochio Yes, as per the first sentence in the question. But, see the above comment; there's a great answer over at TCS. — Matt Munson, Mar 01 '18 at 20:40

Universal Approximation Theorem — Neural Networks

0 Answers0

Linked