Sampling from Nearest Neighbor Density Estimator

Question

The KDE (with variable bandwidth) is defined as $$\hat f(x) = n^{-1}\sum_{i=1}^nh_x^{-1}K\big(h_x^{-1}(x - X_i)\big).$$ Once the density is estimated, one could sample points as follows: pick a poin from the observed sample $X_1, X_2, \dots, X_n$ and then sample from the known density $K$ (c.f simple sampling method for a Kernel Density Estimator)

In the special case where $K$ is assumed to be the uniform density and $h_x$ is chosen as the distance to $x$'s $k$th nearest neighbor, i.e. $h_x = |x - r_k|$, where $r_k$ is the $k$th sample point, ordered by distance to $x$.

Can I proceed in the same way as before, that is, pick a random point from the sample and sample from a uniform distribution?

It's unclear that your initial description is a valid sampling method. After all, the algorithm you describe is ambiguous: once you have selected a point from the sample, *what bandwidth do you use to sample from $K$?* In fact, it looks like that in general $\hat f$ is not even a valid density: it won't integrate to unity. — whuber, Oct 18 '20 at 16:51
but assuming the domain of my target density is constant, I could easily scale the density such that it integrates to unity; so I believe that this is no (serious) issue. I am more concerned with the point you mention: what bandwdith should be chosen? I thought about picking a random point $X$ from the sample and then compute the bandwdith as the distance from $X$ to the $k$th nearest neighbors in the sample — Syd Amerikaner, Oct 18 '20 at 17:54
There definitely *is* a serious issue, as revealed by examples in which $\hat f$ is identically zero. — whuber, Oct 19 '20 at 12:23
how would such an example look like? As far as I understood the problem, the KNN density estimator is just proportional to a density — Syd Amerikaner, Oct 21 '20 at 19:38

Sampling from Nearest Neighbor Density Estimator

0 Answers0