1

The question

Suppose we iteratively use the posterior as the prior on the same data.* What is the limiting distribution of the posterior?

Let's make that precise. The data $X$ and the likelihood function $P(X\mid\theta)$ are fixed throughout. We start with a prior $P_0(\theta)$ and update the posterior iteratively, using the $k^{\text{th}}$ posterior as the $(k+1)^{\text{th}}$ prior:

$$ P_{k+1}(\theta) = \frac{P(X\mid\theta)P_k(\theta)}{\int P(X\mid\theta')P_k(\theta') \; d\theta'} \quad \text{(1)} $$

So, here's the question: What is the limiting distribution of $P_k(\theta)$?

Work so far

We prove the following by induction:

$$ P_k(\theta) = \frac{P(X\mid\theta)^kP_0(\theta)}{\int P(X\mid\theta')^kP_0(\theta') \; d\theta'} \quad \text{(2)} $$

The case for $k=1$ is immediate from the definition (1), so we proceed to the induction step:

\begin{align*} P_{k+1}(\theta) &= \frac{P(X\mid\theta)P_k(\theta)}{\int P(X\mid\theta')P_k(\theta') \; d\theta'} \quad \text{(by (1))}\\[.5em] &= \frac{P(X\mid\theta)\left(\frac{P(X\mid\theta)^kP_0(\theta)}{\int P(X\mid\theta')^kP_0(\theta')\,d\theta'}\right)}{\int P(X\mid\theta')\left(\frac{P(X\mid\theta')^kP_0(\theta')}{\int P(X\mid\theta'')^kP_0(\theta'')\,d\theta''}\right)\,d\theta'} \quad \text{(by the induction hypothesis)}\\[.5em] &= \frac{\frac{1}{\int P(X\mid\theta')^{k+1}P_0(\theta') \; d\theta'}P(X\mid\theta)^{k+1}P_0(\theta)}{\frac{1}{\int P(X\mid\theta'')^{k+1}P_0(\theta'') \; d\theta''}\int P(X\mid\theta')^{k+1}P_0(\theta') \; d\theta'} \quad \text{(since $\textstyle\int P(X\mid\theta'')^{k+1}P_0(\theta'') \; d\theta''$ is a constant)}\\[.5em] &= \frac{P(X\mid\theta)^{k+1}P_0(\theta)}{\int P(X\mid\theta')^{k+1}P_0(\theta') \; d\theta'} \quad \text{(since $\textstyle\int P(X\mid\theta')^{k+1}P_0(\theta') \; d\theta' = \int P(X\mid\theta'')^{k+1}P_0(\theta'') \; d\theta''$)} \end{align*}

So (2) is proved. But is (2) even useful?


*This question is borne purely out of curiosity. The iterative schema given is double-dipping taken to an extreme and I am not suggesting that it should be used in practice!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
dwolfeu
  • 454
  • 1
  • 4
  • 13
  • 2
    The sequence converges to a Dirac mass at the MLE. I worked on this recursive scheme in the 1990's under the name of prior feedback. And again with [the SAME algorithm](https://link.springer.com/article/10.1023/A:1013172322619), re-discovered later under [different names](https://xianblog.wordpress.com/?s=prior+feedback) – Xi'an Jan 19 '21 at 12:11
  • @Xi'an I'm not sure I understand what you mean by "the likelihood is a probability: not in $\theta$". The likelihood is indeed a probability. It is also a function (in $\theta$) that we decide upon when we decide how to model the data (even if the choice of model is all but forced on us). – dwolfeu Jan 19 '21 at 16:16
  • 1
    This means you assume $X$ to be a discrete rv as the assumption does not operate with a pdf $p(x|\theta)$. – Xi'an Jan 19 '21 at 16:28

0 Answers0