Normalizing constant in Bayes' Theorem

Question

As we know, Bayes' Theorem is given by:

$$P(\theta\vert{D})=\frac{P(\theta)P(D\vert\theta)}{P(\theta)P(D\vert\theta)+P(\neg\theta)P(E\vert\neg\theta)}$$

where $\theta$ is the hypothesis and D is the model evidence. This can be rewritten as:

$$P(\theta\vert{D})=\frac{P(\theta)P(D\vert\theta)}{P(D)}$$

where $P(D)=P(\theta)P(D\vert\theta)+P(\neg\theta)P(E\vert\neg\theta)$.

However, we also know that:

$$P(D)=\int{P(D\vert\theta)P(\theta)d\theta}$$

i.e. the model evidence is obtained by integrating out the parameters from the likelihood. As I understand it, this means summing all likelihoods for each possible value of $\theta$ weighted by their respective probabilities. However, how does the integral include the probability that the evidence is not true, i.e. the $P(\neg\theta)P(D\vert\neg\theta)$ expression in the denominator of the first equation? Does the marginal likelihood also contain these probabilities?

Note that summation takes place when you use discrete random variables and integration continuous — Fiodor1234, Sep 04 '21 at 15:14
Cross Validated has built-in support for LaTeX, by the way. I’ll edit your question so it doesn’t rely on an external site generating images. — Arya McCarthy, Sep 04 '21 at 15:21
[Here](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference) is a quick guide for using LaTeX on this site. Also, what is $E$ here? — mhdadk, Sep 04 '21 at 15:23
@mhdadk Sorry, it should have been D for data. I fixed it in the text — Nelson Larsen, Sep 04 '21 at 16:31
You are mixing expressions of conditional probability for events and for continuous variables. — Xi'an, Sep 05 '21 at 05:10

score 0 · Accepted Answer · answered Sep 04 '21 at 16:13

The Bayes theorem states that

$$p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}$$

Then we have to distinguish between two cases: $1)$ when $\theta$ is a discrete random variable and $2)$ when it is a continuous.

In the first case if we expand $p(D)$ we will have

$$p(D) = \sum_{\theta \in \Theta} p(D,\theta)=\sum_{\theta \in \Theta}p(D|\theta)p(\theta)$$

For example, $\theta$ can be equal to $\textbf{Good in Health}$ or $\textbf{Not Good in Health}$ and $D$ the positive or negative response to some tests.

Whereas in the second case you have

$$p(D) = \int_{\theta \in \Theta} p(D|\theta)p(\theta)d\theta$$

In this case, $\theta$ can be the probability of success of a coin flip which we know that lies inside the interval $[0,1]$. There is nothing discrete in $\theta$. Let us say that you want to take the sum over the interval $[0,1]$ then you have to define a grid, as you had in the first case the summation over $\textbf{Good}$ and $\textbf{Not Good}$ in health, $(0, 0.1, 0.2,..., 0.9, 1)$. However, by doing that you discard an infinite amount of possible values for $\theta$, because you have not included in the grid an infinite amount of possible values that might lie withtin $(0,0.1)$ or $(0.1,0.2)$ etc. etc.

Right, so $P(\neg\theta)$ in the above equation for P(D) simply refers to all the values of the possible parameters which are different from a given value $\Theta=\theta$? In your discrete example if $\theta$ is equal to good health then $\neg\theta$ is not in good health, while in the continuous example if $\theta=0$ then $\neg\theta=(0,1]$? — Nelson Larsen, Sep 04 '21 at 16:39

Normalizing constant in Bayes' Theorem

1 Answers1