Can a neural network output represent a posterior probability?

Question

I seem to remember from years ago when I first read Bishop's ANN book that it is possible to construct a neural network such that the outputs should represent the posterior probability that I would have found if I used the likelihood approach to separate the same data. Is this true or have I made this up in my mind? If it is true what are the conditions, I seem to remember it has to have no loops and limits on the number of hidden layers. Although I also seem to remember a result from Komogorov showing that many hidden layers aren't helpful if the network is well trained. Unfortunately I can't find the book anymore and I can't seem to find much on this result which makes me worried it's me.

I think I probably need to flesh out my question a bit as pointed out by @bayerj

I have a few variables (I'll assume I only have one for now) which I am using to find the probability of one of three hypotheses being true given the three variables. I have some data which which a can construct PDFs. I am using

$$P(H_{k}|x) = \frac{P(x|H_{k})P(H_k)}{P({x})}$$

where P(x) is the unconditional probablity density

\begin{equation} P(x) = \sum P(X|H_k)P(H_k) \end{equation}

I'm trying to verify whether it is possible to train a network, using the data with which I construct my PDFs, such that the value of the output node equals P(H_k|x). I'm sure this is possible I just can't remember the working. Since I posted the question I have found http://www-vis.lbl.gov/~romano/mlgroup/papers/neural-networks-survey.pdf which I think is saying what I want to do is possible.

What do you mean by posterior probability, likelihood approach exactly? Can you be more specific with your question in the problem statement? — bayerj, Oct 24 '12 at 09:50
Certainly easy to make a virtual machine representation of Bayesian analysis given the prior is proper so that the possible unknowns (e.g. parameters) can be sampled and then the possible knowns (e.g. data) can be sampled given the unknowns (e.g. parameters) sampled. Stigler points out that it is very general albiet often computationally not feasible yet. Fig 5 Stigler 2010 here http://stats.stackexchange.com/questions/39177/periods-in-history-of-statistics/39193#39193 — phaneron, Oct 24 '12 at 12:43

score 5 · Answer 1 · answered Oct 24 '12 at 14:20

The conditions under which the output of a neural net can be treated as estimates of posterior probabilities are fairly broad, I remember the following paper as being pretty interesting and informative (caveat: but I've not read it since 2002)

Marco Saerens, Patrice Latinne, Christine Decaestecker: Any reasonable cost function can be used for a posteriori probability approximation. IEEE Transactions on Neural Networks 13(5): 1204-1210 (2002)

I suspect it has references to the classic papers in the text as well. Prof. Saerens has written several really nice papers; I would recommend anyone seriously interested in neural nets to look up his papers in Google scholar etc.

You could view the output layer of a neural net as being a logistic regression model (with outputs that represent probabilities), and the hidden layer as being a non-linear transformation of the inputs. It is the output layer that matters regarding the interpretation of the outputs, so the MLP is essentially just a non-linear logistic regression model.

Can a neural network output represent a posterior probability?

1 Answers1