Normalizing constant in Bayes theorem

Question

I read that in Bayes rule, the denominator $\Pr(\textrm{data})$ of

$$\Pr(\text{parameters} \mid \text{data}) = \frac{\Pr(\textrm{data} \mid \textrm{parameters}) \Pr(\text{parameters})}{\Pr(\text{data})}$$

is called a normalizing constant. What exactly is it? What is its purpose? Why does it look like $\Pr(data)$? Why doesn't it depend on the parameters?

When you integrate $f(\text{data}|\text{params})f(\text{params})$, you are integrating over the parameters and so the result has no term depending on the parameters, in the same way that $\int_{x=0}^{x=2}xy\;dx = 2y$ does not depend on $x$. — Henry, Jun 20 '11 at 18:57

score 20 · Accepted Answer · edited Mar 07 '18 at 17:59

20

The denominator, $\Pr(\textrm{data})$, is obtained by integrating out the parameters from the join probability, $\Pr(\textrm{data}, \textrm{parameters})$. This is the marginal probability of the data and, of course, it does not depend on the parameters since these have been integrated out.

Now, since:

$\Pr(\textrm{data})$ does not depend on the parameters for which one wants to make inference;
$\Pr(\textrm{data})$ is generally difficult to calculate in a closed-form;

one often uses the following adaptation of Baye's formula:

$\Pr(\textrm{parameters} \mid \textrm{data}) \propto \Pr(\textrm{data} \mid \textrm{parameters}) \Pr(\textrm{parameters})$

Basically, $\Pr(\textrm{data})$ is nothing but a "normalising constant", i.e., a constant that makes the posterior density integrate to one.

edited Mar 07 '18 at 17:59

answered Jun 20 '11 at 04:28

ocram

19,898
5
76
77

2

@nbro: I mean Pr(data) = integral over the parameters of Pr(data, parameters) – ocram Mar 08 '18 at 06:24
What do you mean by 'P(data) is generally difficult to calculate in a closed-form'? – unicorn Aug 18 '20 at 07:56
@unicorn: To calculate P(data), one has to integrate P(data, parameters) over the parameters. This task is generally difficult. – ocram Aug 19 '20 at 05:31

score 2 · Answer 2 · edited Mar 07 '18 at 18:04

2

When applying Bayes' rule, we usually wish to infer the "parameters" and the "data" is already given. Thus, $\Pr(\textrm{data})$ is a constant and we can assume that it is just a normalizing factor.

edited Mar 07 '18 at 18:04

answered Jun 20 '11 at 18:04

Harsh

323
3
7

Normalizing constant in Bayes theorem

2 Answers2

Linked

Related