I have posted this question elsewhere--MSE-Meta, MSE, TCS, MetaOptimize. Previously, no one had given a solution. But now, here is a really excellent and comprehensive answer.
Universal approximation theorem states that "the standard multilayer feed-forward network with a single hidden layer, which contains finite number of hidden neurons, is a universal approximator among continuous functions on compact subsets of Rn, under mild assumptions on the activation function."
I understand what this means, but the relevant papers are too far over my level of math understanding to grasp why it is true or how a hidden layer approximates non-linear functions.
So, in terms little more advanced than basic calculus and linear algebra, how does a feed-forward network with one hidden layer approximate continuous, non-linear functions? The answer need not necessarily be totally concrete.