Proving the convergence of KDE algorithms when the samples are non-i.i.d

Question

I am currently working on convergence proof for a new method for non-parametric importance sampling, and I need some help...

My method uses an MCMC algorithm to generate a set of dependent $M$ samples $X_1 \dots X_M$ from a distribution $g$, and then reconstructs an approximation of this distribution $\hat{g}$ using a KDE algorithm. I want to show is that $\hat{g} \rightarrow g$ as $M \rightarrow \infty$.

Foruntately, I was able to find a proof that is similar to the one that I need in a paper about non-parametric importance sampling by Ping Zhan (see the appendix in this paper here for more details ). However, in this paper, it seems as if the author assumes that the samples used in the KDE algorithm $X_1 \dots X_M$ are i.i.d - which does not hold in my case given that I used an MCMC algorithm to generate them.

I am wondering if there is a way to tweak this existing convergence proof to account for the fact that we are using non-i.i.d samples within the KDE algorithm? Intuitively, I think that using correlated samples within a KDE algorithm will reduce its effectiveness - so maybe there is some kind of bounding factor? Otherwise, perhaps I exploit the fact that the samples come from an MCMC algorithm whose stationary distribution is $g$? If it helps, I can assume that the MCMC algorithm is just generic Metropolis-Hastings.

I'm open to any ideas!

Once a person asked this to [Richard Samworth](http://www.statslab.cam.ac.uk/~rjs57/) and he replied that if you use a large enough thinning period, then the dependence of the MCMC sample is not an issue as it is very small. I think you can prove that the approximation can be good for a large enough thinning. Does this makes sense in your context? — , Jul 21 '12 at 18:09
In addition, KDE is valid under the use of dependent samples but you need to choose a different bandwidth. See for example [this paper](http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1034713655). — , Jul 21 '12 at 18:17
@Procrastinator Thank for this! What exactly do you mean when you're talking about the "thinning period"? The paper is also very helpful. — Berk U., Jul 23 '12 at 17:02
thinning is a common technique in MCMC methods. It consists of subsampling from the chain every $k$ iterations (usually $k>25$). The idea is that, using the Markov property, taking distanced/separated samples, the dependence between them becomes small. You might also be interested on the burn-in period which consists of starting this subsampling after a number of iterations (usually greater than $1000$). These steps together produce an approximately independent sample from the stationary distribution. — , Jul 23 '12 at 17:06

Proving the convergence of KDE algorithms when the samples are non-i.i.d

0 Answers0