22

I read that in Bayes rule, the denominator $\Pr(\textrm{data})$ of

$$\Pr(\text{parameters} \mid \text{data}) = \frac{\Pr(\textrm{data} \mid \textrm{parameters}) \Pr(\text{parameters})}{\Pr(\text{data})}$$

is called a normalizing constant. What exactly is it? What is its purpose? Why does it look like $\Pr(data)$? Why doesn't it depend on the parameters?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
amateur
  • 374
  • 1
  • 2
  • 6
  • 6
    When you integrate $f(\text{data}|\text{params})f(\text{params})$, you are integrating over the parameters and so the result has no term depending on the parameters, in the same way that $\int_{x=0}^{x=2}xy\;dx = 2y$ does not depend on $x$. – Henry Jun 20 '11 at 18:57

2 Answers2

20

The denominator, $\Pr(\textrm{data})$, is obtained by integrating out the parameters from the join probability, $\Pr(\textrm{data}, \textrm{parameters})$. This is the marginal probability of the data and, of course, it does not depend on the parameters since these have been integrated out.

Now, since:

  • $\Pr(\textrm{data})$ does not depend on the parameters for which one wants to make inference;
  • $\Pr(\textrm{data})$ is generally difficult to calculate in a closed-form;

one often uses the following adaptation of Baye's formula:

$\Pr(\textrm{parameters} \mid \textrm{data}) \propto \Pr(\textrm{data} \mid \textrm{parameters}) \Pr(\textrm{parameters})$

Basically, $\Pr(\textrm{data})$ is nothing but a "normalising constant", i.e., a constant that makes the posterior density integrate to one.

ocram
  • 19,898
  • 5
  • 76
  • 77
  • 2
    @nbro: I mean Pr(data) = integral over the parameters of Pr(data, parameters) – ocram Mar 08 '18 at 06:24
  • What do you mean by 'P(data) is generally difficult to calculate in a closed-form'? – unicorn Aug 18 '20 at 07:56
  • @unicorn: To calculate P(data), one has to integrate P(data, parameters) over the parameters. This task is generally difficult. – ocram Aug 19 '20 at 05:31
2

When applying Bayes' rule, we usually wish to infer the "parameters" and the "data" is already given. Thus, $\Pr(\textrm{data})$ is a constant and we can assume that it is just a normalizing factor.

Harsh
  • 323
  • 3
  • 7