How does the proof of Rejection Sampling make sense?

Question

I am taking a course on Monte Carlo methods and we learned the Rejection Sampling (or Accept-Reject Sampling) method in the last lecture. There are a lot of resources on the web which shows the proof of this method but somehow I am not convinced with them.

So, in the Rejection Sampling, we have a distribution $f(x)$ which is hard to sample from. We choose a easy-to-sample distribution $g(x)$ and find a coefficient $c$ such that $f(x) \leq cg(x)$. Then we sample from $g(x)$ and for each draw, $x_i$, we also sample a $u$ from a standard uniform distribution $U(u|0,1)$.

The sample $x_i$ is accepted if it is $cg(x_i)u \leq f(x_i)$ and rejected otherwise.

The proofs which I came across usually just show that $p(x|Accept) = f(x)$ and stop there.

What I think about this process is that we have a sequence of variables $x_1,Accept_1,x_2,Accept_2,...,x_n,Accept_n$ and a $x_i,Accept_i$ pair corresponds to our i.th sample ($x_i$) and whether it is accepted ($Accept_i$). We know that each $x_i,Accept_i$ pair is independent of each other, such that:

$P(x_1,Accept_1,x_2,Accept_2,...,x_n,Accept_n) = \prod\limits_{i=1}^n P(x_i,Accept_i)$

For a $(x_i,Accept_i)$ pair we know that $P(x_i) = g(x_i)$ and $P(Accept_i|x_i) = \frac{f(x_i)}{cg(x_i)}$. We can readily calculate $p(x_i|Accept_i)$ but I don't understand how it suffices as a proof. We need to show that the algorithm works, so I think a proof should show that the empricial distribution of the accepted samples converge to $f(x)$ as $n\rightarrow\infty$. I mean, with $n$ being the number of all accepted and rejected samples:

$\frac{Number \hspace{1mm} of \hspace{1mm} samples \hspace{1mm} with \hspace{1mm} (A \leq x_i \leq B)}{Number \hspace{1mm} of \hspace{1mm} accepted \hspace{1mm} samples} \rightarrow \int_A^B f(x)dx$ as $n\rightarrow\infty$.

Am I wrong with this thought pattern? Or is there a connection between the common proof of the algorithm and this?

Thanks in advance

score 8 · Accepted Answer · answered Mar 10 '14 at 22:46

You should think of the algorithm as producing draws from a random variable, to show that the algorithm works, it suffices to show that the algorithm draws from the random variable you want it to.

Let $X$ and $Y$ be scalar random variables with pdfs $f_X$ and $f_Y$ respectively, where $Y$ is something we already know how to sample from. We can also know that we can bound $f_X$ by $Mf_Y$ where $M\ge1$.

We now form a new random variable $A$ where $A | y \sim \text{Bernoulli } \left (\frac{f_X(y)}{Mf_Y(y)}\right )$, this takes the value $1$ with probability $\frac{f_X(y)}{Mf_Y(y)} $ and $0$ otherwise. This represents the algorithm 'accepting' a draw from $Y$.

Now we run the algorithm and collect all the draws from $Y$ that are accepted, lets call this random variable $Z = Y|A=1$.

To show that $Z \equiv X$, for any event $E$, we must show that $P(Z \in E) =P(X \in E)$.

So let's try that, first use Bayes' rule:

$P(Z \in E) = P(Y \in E | A =1) = \frac{P(Y \in E \& A=1)}{P(A=1)}$,

and the top part we write as

\begin{align*}P(Y \in E \& A=1) &= \int_E f_{Y, A}(y,1) \, dy \\ &= \int_E f_{A|Y}(1,y)f_Y(y) \, dy =\int_E f_Y(y) \frac{f_X(y)}{Mf_Y(y)} \, dy =\frac{P(X \in E)}{M}.\end{align*}

And then the bottom part is simply

$P(A=1) = \int_{-\infty}^{\infty}f_{Y,A}(y,1) \, dy = \frac{1}{M}$,

by the same reasoning as above, setting $E=(-\infty, +\infty)$.

And these combine to give $P(X \in E)$, which is what we wanted, $Z \equiv X$.

That is how the algorithm works, but at the end of your question you seem to be concerned about a more general idea, that is when does an empirical distribution converge to the distribution sampled from? This is a general phenomenon concerning any sampling whatsoever if I understand you correctly.

In this case, let $X_1, \dots, X_n$ be iid random variables all with distribution $\equiv X$. Then for any event $E$, $\frac{\sum_{i=1}^n1_{X_i \in E}}{n}$ has expectation $P(X \in E)$ by the linearity of expectation.

Furthermore, given suitable assumptions you could use the strong law of large numbers to show that the empirical probability converges almost surely to the true probability.

Thanks for the answer. Can you clarify how can I show that the emprical distribution converges to the target distribution by using the Law of Large Numbers? It is exactly what I try to show in this case. — Ufuk Can Bicici, Mar 11 '14 at 01:25
Glivenko-Cantelli: http://www2.imperial.ac.uk/~das01/MyWeb/M3S3/Handouts/GlivenkoCantelli.pdf — Zen, Mar 11 '14 at 03:22
@Harri What disturbs me is the fact that we learn the random variable indicating acceptance of the draw ($A=1$) after we have learned the value of the actual variable. We observe the variables according to the sequence $Y_1,A_1,Y_2,A_2,...,Y_n,A_n$, so if we are about to observe the variable $Y_2$, what we know about the system is $Y_1$ and $A_1$ and since $Y_2$ is independent of them, what we process is first $P(Y_2)$, then $P(A_2|Y_2)$ not the other way around. — Ufuk Can Bicici, Mar 11 '14 at 15:05
Could you say some more on why the order of knowing $P(Y_2)$ and then $P(A_2|Y_2)$ disturbs you? — Harri, Mar 28 '14 at 10:39

score 3 · Answer 2 · answered Jul 08 '18 at 09:44

First, keep in mind that a complete procedure of the rejection sampling method only produces a single random variable. When some $x_i$ is accepted, the procedure stops, and there is no $x_{i+1}$ anymore. If you want multiple random variables, just repeat the procedure multiple times.

In some textbook, they denote the event of acceptance by $A$ and calculate the probability

$$ \begin{aligned} P(A) =& \int_{-\infty}^{\infty}dx\int_0^{\frac{f(x)}{cg(x)}}g(x)du \\ =& \int_{-\infty}^{\infty}\frac{1}{c}f(x)dx \\ =& \frac{1}{c}. \end{aligned} $$

And

$$ \begin{aligned} f_X(x|A) =& \frac{f_X(x) \cdot P(A|x)}{P(A)}\\ =& \frac{g(x) \cdot \frac{f(x)}{cg(x)}}{\frac{1}{c}} \\ =& f(x). \end{aligned} $$

The confusing thing is that the acceptance $A$ here appears to be acceptance of a single sample of $x_i$, but the whole procedure may reject multiple $x_i$'s.

Yes, a more rigorous proof should consider the probability of acceptance at different steps. Let $X_i$ denote the $i$th sample, $f_{X_i}$ denote the probability density function of $X_i$, $A_i$ denote the $i$th acceptance, and $X_\infty$ denote the final accepted value. Then the probability density function of $X_\infty$ is $$ f_{X_\infty}(x) = P(A_1) f_{X_1}(x|A_1) + P(A_2) f_{X_2}(x|A_2) + \dots. $$ $P(A_1)$ is $\frac{1}{c}$ and $f_{X_1}(x|A_1)$ is $f(x)$ as calculated before. Note $P(A_2)$ is $\left(1-\frac{1}{c}\right)\frac{1}{c}$ where $1-\frac{1}{c}$ is the probability of the rejection of $X_1$ since only when $X_1$ is rejected have we a chance to choose an $X_2$.

And $f_{X_2}(x|A_2)$ is $f(x)$ too because the second step is not affected by previous steps, its probability should be the same as the first step. If this explanation doesn't convince you, we can also work it out rigorously. Be careful $X_2$ is not defined when $X_1$ is accepted (or you can define it to be an arbitrary number when $X_1$ is accepted if undefined value makes you uncomfortable), so for probabilities concerning $X_2$, only conditional probabilities given $A_1^c$ or subsets of $A_1^c$ make sense. Now $$ \begin{aligned} f_{X_2}(x|A_2) =& \frac{P(A_1^c)f_{X_2}(x|A_1^c)P(A_2|X_2=x)}{P(A_2)} \\ =& \frac{P(A_1^c)f_{X_2}(x|A_1^c)P(A_2|X_2=x)}{P(A_1^c)P(A_2|A_1^c)} \\ =& \frac{f_{X_2}(x|A_1^c)P(A_2|X_2=x)}{P(A_2|A_1^c)} \\ =& \frac{g(x) \cdot \frac{f(x)}{cg(x)}}{\frac{1}{c}} \\ =& f(x). \end{aligned} $$ So $$ \begin{aligned} f_{X_\infty}(x) =& P(A_1) f(x) + P(A_2) f(x) + \dots \\ =& (P(A_1) + P(A_2) + \dots) f(x) \\ =& \left(\frac{1}{c} + \left(1-\frac{1}{c}\right)\frac{1}{c} + \left(1-\frac{1}{c}\right)^2\frac{1}{c} + \dots\right) f(x) \\ =& f(x). \end{aligned} $$ That is the desired result. Note $P(A_1) + P(A_2) + \dots$ = 1 has an intuitive meaning, that is eventually one sample will be accepted at some step $i$.

How does the proof of Rejection Sampling make sense?

2 Answers2

Linked

Related