simple sampling method for a Kernel Density Estimator

Question

I have developed a simple Kernel Density Estimator in Java, based on a few dozen points (maybe up to one hundred or so) and a Gaussian kernel function. The implementation gives me the PDF and CDF of my probability distribution at any point.

I would now like to implement a simple sampling method for this KDE. An obvious choice would of course be to draw from the very set of points making up the KDE, but I would like to be able to retrieve points that are slightly different from the ones in the KDE.

I haven't found so far a sampling technique that I could easily implement to solve this problem (without depending on external libraries for numerical integration or complex computations). Any advices? I don't have specially strong requirements when it comes to precision or efficiency, my main concern is to have a sampling function that works and can be easily implemented. Thanks!

This is detailed in page 5 of [this document](http://www.stat.cmu.edu/~cshalizi/350/lectures/28/lecture-28.pdf). — , Nov 15 '12 at 18:21
@user10525 the code provided is incorrect, it should be: `rnorm(n, sample(dx$x, n, prob = dx$y, replace = TRUE), dx$bw)` where `dx` is output from `density` function. Argument `prob` has to be provided because otherwise you sample uniformly. — Tim, Dec 22 '15 at 20:29

score 19 · Answer 1 · edited Dec 22 '12 at 09:09

19

As mentioned by Procrastinator, there's a simple way to sample from a Kernel density estimator:

Draw one point $x_i$ from the set of points $x_1$,...$x_n$ included in the KDE
Once you have the point $x_i$, draw a value from the kernel associated with the point. In this case, draw from the Gaussian $\mathcal{N}(x_i,h)$ centered at $x_i$ and of variance $h$ (the bandwidth)

edited Dec 22 '12 at 09:09

jonsca

1,790
3
20
30

answered Nov 18 '12 at 02:29

Pierre Lison

771
6
17

(+1) For sharing your solution. – Nov 19 '12 at 10:15
Is $x_i$ one of the original points? If so, looks like we don't really need to construct the actual KDE at all. Just sampling from one of the original points, and $N (x_i,h)$ should suffice? – Ram Apr 08 '13 at 23:19
Yes indeed, if you are only using the KDE distribution for sampling, you do not need to explicitly construct the PDF: the only information necessary for the sampling operation is the set of points and the bandwidth. – Pierre Lison Apr 09 '13 at 06:28
just to add to Pierre Lison: In step 2.: For sampling from a Gaussian kernel, the bandwidth h should be taken as the standard deviation of the Gaussian distribution around the point x_i, not the variance. – Dec 22 '15 at 18:52
Wouldn't you want to sample using standard deviation 1/h or something? As written, the less likely x_i is, the more likely you are to sample another unlikely point nearby because the standard deviation of N is low. – chris Jul 03 '19 at 21:24

simple sampling method for a Kernel Density Estimator

1 Answers1

Linked