How are radial basis functions (RBFs) networks extended to use multiple layers?

Question

I am trying to understand the interpretation of radial basis functions (RBFs) as networks and then trying to understand the relationship it has to "normal" neural networks and how to extend them to multiple layers.

For this I was watching the lectures from the caltech course CS156 and they present the following slide:

enter image description here

This slide shows that RBF networks and neural networks are quite different and extending the RBF network to multiple laters is not 100% clear to me how it would be done (or if it even makes sense to do).

One way that I thought to extend RBFs to multiple layers is by having a general neural network as a usual neural network, but then substitute each activation function with a Gaussian function.

However, I thought of a different way of doing it more similar to the way the RBF network is build but extend that recursively. The main idea is to make sure each later computes feature based on the distance to a center with respect to the previous layer. Something of the form:

$$ x^{(l)}_k = \phi( \| x^{(l-1)} - \mu^{l}_k \| )$$

where $x^{(l)}_k$ denotes the output of the activation of a certain activation node in the neural network. The idea is that its getting the whole output of the previous later, computing a distance with respect to a center of the current hidden layer and then applying an non-linearity to it. To illustrate my idea, here how the first later would look:

enter image description here

For a arbitrary layer:

enter image description here

The issue with such a network is that even though it makes sense as a generalization to the RBF netowrk, its structure is completely different from neural networks since only the last layer has weights to be updated. Another issue is choosing the centers. The centers could just be chosen with k-means at each layer but I am not 100% if this is justifiable in any framework (like regularization).

Anyway, my main question is, how are RBF networks extended to use multiple lyaers like the multiple layer perceptron (MLP/neural networks) are used. I provided two suggestion but wasn't sure which was correct or which one was actually used or which one actually had some research done.

Imho, multiple layers in rbf net doesn't make sense. What you can change in net, is initial state and number of neurons in hidden layer. — 404pio, Jun 24 '15 at 07:54
@frankov I also questioned that point myself (and wasn't sure if it made sense), however, yet again I don't have good theoretical reason to justify normal neural networks. Its clear they aren't that bad since they work in practice but apart from that It'd be interesting to know if the sigmoids can be changed to other functions, like RBFs seems like a place to start. — Charlie Parker, Jun 24 '15 at 18:26
@404pio why dod you think multiple layer RBF doesn't make sense? — Charlie Parker, Aug 11 '16 at 02:14
I've read this in 2 books about neural networks, that RBF net has only one hidden layer. — 404pio, Aug 11 '16 at 08:42
@404pio so what if they only have one layer in those books? Do they provide a reason to why not to use RBFs in multiple layers? — Charlie Parker, May 16 '17 at 19:50
They do not provide explanation why there is only one hidden layer. Probably it is a result of RBF neuron property. — 404pio, May 18 '17 at 08:27
http://www.ieee.cz/knihovna/Zhang/Zhang100-ch03.pdf take a look here, maybe explanation on page 81 is sufficient? — 404pio, May 18 '17 at 08:29
@404pio I guess the bottom line that they argue is that since RBFs are only active if the data points are close to its centers, it uses localized activations to produce approximations so it there are lots of dimensions one might need an exponential number of units. Which intuitively makes sense, but I am still not convinced, I always thought that one needed an exponential bound on the number of units for the universal approximation theorem for Neural Nets, does this just say RBFs need a even worse doubly exponential bound or something? The curse of dim seems to apply to both not just RBFs. — Charlie Parker, May 18 '17 at 23:00
"Radial functions are simply a class of functions. In principle they could be employed in any sort of model (linear or nonlinear), and any sort of network (single layer or multi layer). However, since Broomhead and Lowe's 1988 seminal paper, radial basis function networks (RBF networks) have traditionally been associated with radial functions in a single layer network." https://www.cc.gatech.edu/~isbell/tutorials/rbf-intro.pdf — endolith, Dec 04 '18 at 16:49
(It seems intuitively to me that they would make learning much faster, since they have only local action, making the optimization energy landscape "funnel-like", while ReLU weights have action to infinity, for instance, and make everything interdependent and the energy landscape has lots of independent branches, but I'm only thinking of single-layer 1D curve fitting, so I may be completely wrong when it becomes highly-dimensional and multi-layer.) — endolith, Dec 04 '18 at 16:51

How are radial basis functions (RBFs) networks extended to use multiple layers?

0 Answers0