1

My textbook says the following:

In order to make probability statements about $\theta$ given $y$, we must begin with a model providing a joint probability distribution for $\theta$ and $y$. The joint probability mass or density function can be written as a product of two densities that are often referred to as the prior distribution $p(\theta)$ and the sampling distribution (or data distribution) $p(y|\theta)$, respectively:

$$p(\theta, y) = p(\theta)p(y|\theta)$$

Simply conditioning on the known value of the data $y$, using the basic property of conditional probability known as Bayes' rule, yields the posterior density:

$$p(\theta|y) = \dfrac{p(\theta, y)}{p(y)} = \dfrac{p(\theta)p(y|\theta)}{p(y)}, \tag{1.1}$$

where $p(y) = \sum_\theta p(\theta)p(y|\theta)$, and the sum is over all possible values of $\theta$ (or $p(y) = \int p(\theta) p(y | \theta) \ d\theta$ in the case of continuous $\theta$). An equivalent form of (1.1) omits the factor $p(y)$, which does not depend on $\theta$ and, with fixed $y$, can thus be considered a constant, yielding the unnormalised posterior density, which is the right side of (1.2):

$$p(\theta|y) \propto p(\theta)p(y|\theta)$$

Page 7, Bayesian Data Analysis, Third Edition, by Gelman et al.

If we have

$$p(\theta|y) = \dfrac{p(\theta, y)}{p(y)} = \dfrac{p(\theta)p(y|\theta)}{p(y)} \tag{1.1},$$

then we can multiply through by $p(y)$ to get

$$p(\theta, y) = p(\theta)p(y|\theta).$$

So I'm wondering why we change the equals sign to a proportional ($\propto$) sign? Mathematically, why are we doing this? As I demonstrated above, there doesn't seem to be anything that algebraically indicates that that we must do this?

I would greatly appreciate it if people could please take the time to clarify this.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
The Pointer
  • 1,064
  • 13
  • 35

1 Answers1

5

The equations$$p(\theta|y) \propto p(\theta)p(y|\theta)$$and$$p(\theta, y) = p(\theta)p(y|\theta)$$ differ by the multiplicative term$$p(y)^{-1}$$which is a constant when considering both sides of the equations as functions of $\theta$, $y$ being fixed since "observed". Both equations are correct from a mathematical perspective. The appeal of the "$\propto$" symbol is to state that the posterior density is proportional to the product of the prior by the likelihood function, i.e.,

$$\text{posterior } \propto \text{prior }\times\text{ likelihood}$$

which is usually available in closed form and hence can be used in numerical and Monte Carlo evaluations of the posterior. The proportionality is understood in terms of functions of $\theta$, not of $y$ or $(\theta,y)$. The marginal $p(y)$ is often not available in closed form.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 1
    Ahh, I see now. Because if we multiply $$p(\theta|y) = \dfrac{p(\theta, y)}{p(y)} = \dfrac{p(\theta)p(y|\theta)}{p(y)}$$ through by $p(y)$, we get $$p(\theta|y) p(y) = p(\theta, y) = p(\theta)p(y|\theta).$$ And so we have the two equations $$p(\theta|y) p(y) = p(\theta)p(y|\theta)$$ and $$p(\theta, y) = p(\theta)p(y|\theta),$$ which differ by the multiplicative constant $p(y)$, which means that we have $$p(\theta|y) \propto p(\theta)p(y|\theta)$$ and $$p(\theta, y) = p(\theta)p(y|\theta),$$ since the first equation would be $$p(\theta|y) = p^{-1}(y) p(\theta)p(y|\theta),$$ [...] – The Pointer Nov 03 '18 at 11:33
  • [...] where the factor $p^{-1}(y)$ is a constant. – The Pointer Nov 03 '18 at 11:33
  • And since $$ \int_{-\infty}^{\infty}P(\theta| y) = 1 $$ we know $$ \int_{-\infty}^{\infty}P(y) = \int_{-\infty}^{\infty}P(\theta)P(y| \theta) $$ – Ron Jensen Jun 21 '19 at 17:28