1

As we know, Bayes' Theorem is given by:

$$P(\theta\vert{D})=\frac{P(\theta)P(D\vert\theta)}{P(\theta)P(D\vert\theta)+P(\neg\theta)P(E\vert\neg\theta)}$$

where $\theta$ is the hypothesis and D is the model evidence. This can be rewritten as:

$$P(\theta\vert{D})=\frac{P(\theta)P(D\vert\theta)}{P(D)}$$

where $P(D)=P(\theta)P(D\vert\theta)+P(\neg\theta)P(E\vert\neg\theta)$.

However, we also know that:

$$P(D)=\int{P(D\vert\theta)P(\theta)d\theta}$$

i.e. the model evidence is obtained by integrating out the parameters from the likelihood. As I understand it, this means summing all likelihoods for each possible value of $\theta$ weighted by their respective probabilities. However, how does the integral include the probability that the evidence is not true, i.e. the $P(\neg\theta)P(D\vert\neg\theta)$ expression in the denominator of the first equation? Does the marginal likelihood also contain these probabilities?

1 Answers1

0

The Bayes theorem states that

$$p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}$$

Then we have to distinguish between two cases: $1)$ when $\theta$ is a discrete random variable and $2)$ when it is a continuous.

In the first case if we expand $p(D)$ we will have

$$p(D) = \sum_{\theta \in \Theta} p(D,\theta)=\sum_{\theta \in \Theta}p(D|\theta)p(\theta)$$

For example, $\theta$ can be equal to $\textbf{Good in Health}$ or $\textbf{Not Good in Health}$ and $D$ the positive or negative response to some tests.

Whereas in the second case you have

$$p(D) = \int_{\theta \in \Theta} p(D|\theta)p(\theta)d\theta$$

In this case, $\theta$ can be the probability of success of a coin flip which we know that lies inside the interval $[0,1]$. There is nothing discrete in $\theta$. Let us say that you want to take the sum over the interval $[0,1]$ then you have to define a grid, as you had in the first case the summation over $\textbf{Good}$ and $\textbf{Not Good}$ in health, $(0, 0.1, 0.2,..., 0.9, 1)$. However, by doing that you discard an infinite amount of possible values for $\theta$, because you have not included in the grid an infinite amount of possible values that might lie withtin $(0,0.1)$ or $(0.1,0.2)$ etc. etc.

Fiodor1234
  • 1,679
  • 6
  • 15
  • Right, so $P(\neg\theta)$ in the above equation for P(D) simply refers to all the values of the possible parameters which are different from a given value $\Theta=\theta$? In your discrete example if $\theta$ is equal to good health then $\neg\theta$ is not in good health, while in the continuous example if $\theta=0$ then $\neg\theta=(0,1]$? – Nelson Larsen Sep 04 '21 at 16:39