5

I am trying to figure out how to do learning using probabilistic programming languages. For this I am following different paths to get a hold on the way of thinking.

I understand modelling using neural networks, and understand how learning in this context works. Now I am trying to figure out the analog in Bayesian reasoning.

I understand following:

  • input and output vector of neural networks correspond to distributions (in particular categorical distributions)
  • weight matrices corresponds to inference of a prior distribution to a posterior distribution
  • learning algorithms, such as backpropagation, corresponds to what?

So, my questions is what learning corresponds to in the probability theoretic terminology? more specifically: How to learn inference functions?

I might have overlooked something quite simple, maybe even trivial. In that case forgive me this question.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Mads Buch
  • 151
  • 2
  • There is a nice overview [here](http://www.deeplearningbook.org/contents/graphical_models.html). – GeoMatt22 Dec 28 '16 at 16:08
  • Thank you. I tried skimming the pages with no luck. It should be mentioned, that the reason that I write here is by far because of the information overload induced by these types of texts. – Mads Buch Dec 28 '16 at 17:49
  • Are you familiar with the idea of [discriminative vs. generative models](http://stats.stackexchange.com/questions/12421/generative-vs-discriminative)? – GeoMatt22 Dec 28 '16 at 18:50
  • By the way the "deep learning" book has a pretty comprehensive overview, from the ground up, of both neural networks and probabilistic modeling. So it is a nice overall reference ... but yes, I would **not** recommend starting with chapter 16! – GeoMatt22 Dec 28 '16 at 18:53

1 Answers1

3

Weight representation: In a standard neural network, each connection has a scalar weight value. In a Bayesian version of a neural network, each connection has a distribution of weight values. More generally, for any node or neuron in a standard neural network, there is a "fan" of incoming connections, each of which has a single scalar weight value. In a Bayesian neural network, there is instead a joint distribution of weights on the fan-in connections.

Learning: In a standard neural network, learning changes the vector of weight values. There are various learning algorithms, usually motivated by increasing consistency or decreasing a cost function (e.g., error reduction in backprop).

In a Bayesian neural network, learning changes the distribution of weight values. Learning in a Bayesian network works by applying Bayes rule: The previous distribution across the weights is the prior, the current activation of the nodes is the data, and then learning adjusts the weight distribution according to Bayes rule.

Perhaps the simplest version of a Bayesian neural network is a "Kalman filter." It's just a linear node that adjusts its weight distribution with Bayes rule. Here's an article that gives an fairly introductory description, though it's couched in the language of associative learning in psychology: Kruschke, J. K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning & Behavior, 36(3), 210-226.

John K. Kruschke
  • 2,153
  • 12
  • 16