1

We define Likelihood as follows:

$$ \mathcal{L}(\theta | X) = \prod P(x_{i}|\theta) $$

Question: How to assume the probability function $P$, specially in case of complex dataset?

I understand that if we are doing a Coin Toss, I can assume $P$ to be Bernoulli. But what if my dataset is complex (ex: financial data, flu cases) or I am working on some complex use case where I am using a Neural Network to classify images for example and then applying Bayesian inference for identifying the network weights $W$.

$$ P(W | D) \propto P(D | W) P(W)$$ where, $$ P(W) = N(0, 1) $$ but how do we define / assume, $$ P(D | W) = ?? $$

The Wanderer
  • 647
  • 4
  • 16

2 Answers2

0

The distribution usually follows from what you know about your data. If your data is about successes and failures, then it immediately follows that you need Bernoulli distribution for it, if you have data about counts of successes and failures, then you need binomial distribution, etc. Besides knowing your data, you need to be familiar with the distributions. To learn more about them you can check the Statistics 110 (lectures and materials available freely online) course by Joe Blitzstein, from Harvard University, who introduces the "stories" behind probability distributions that give intuitions on what kind of data generating process do they describe. If your data is something strange, then you usually know something about it, e.g. that it is continuous and non-negative (so maybe gamma distribution would work for it?) and this should lead you to appropriate distribution. If you cannot define the likelihood, you can always use Approximate Bayesian Computation, that does not need explicit likelihood function and instead it needs only some kind of summary statistic, plus assumptions about the data generating process.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you for your answer. I am aware of the standard distribution functions and their behavior. My question is what if observed data doesn't follow any of these standard distributions. For example: what if I am looking at the weights in Neural Networks. For `prior` of weight, I can assume that to be `gaussian` but what distribution will I assume for $P(data | W)$ so that I can calculate $P(W | data)$ ? – The Wanderer Jun 05 '17 at 22:43
  • @TheWanderer P(data|W) is your model, so you basically ask "how to define a statistical model" and this is a *very* broad topic. If you are looking in particular for Bayesian neural networks, then google them to find some references. – Tim Jun 06 '17 at 11:05
0

I found a wonderful resource online that describes this question in much detail.

Bayesian Methods for Neural Networks: https://www.cs.cmu.edu/afs/cs/academic/class/15782-f06/slides/bayesian.pdf

Also, see chapter 10 of 'Neural Networks for Pattern Recognition' book by Bishop: http://cs.du.edu/~mitchell/mario_books/Neural_Networks_for_Pattern_Recognition_-_Christopher_Bishop.pdf#page=400

The Wanderer
  • 647
  • 4
  • 16