0

I have successfully generated samples from the 1D Epanechnikov kernel, following the routine described on page 236 in "Nonparametric Density Estimation" by Devroye and Gyorfi (Also described in this stats post), where

  1. Generate three independent uniform samples [-1,-1] $V_1, V_2, V_3$.
  2. If $|V_3| > |V_1|$ and $|V_3| >| V_2|$, then return $W=V_2$ else return $W=V_3$

Results in the following samples: epa samples

I naively assumed that in order to sample in high dimensions, one could simply sum two independent samples from the kernel, however that results in the following enter image description here

The outer ring shows the $||H||_2$ radius (with H the bandwidth matrix). Page 236-237 in "Nonparametric Density Estimation" by Devroye and Gyorfi also discusses the requirements of the higher dimensional kernel. I'm however unsure how to sample these higher dimensions. Can the above algorithm be extended to n-dimensions?

Thomas
  • 21
  • 4
  • 1
    Your question is fully answered in the few lines spanning pp 236-7 of the reference. It explains how to scale an "independent random vector uniformly distributed on the unit sphere of $R^d.$" That question is addressed here on CV at https://stats.stackexchange.com/questions/7977. That thread describes in detail the "polar method" explained in the next paragraph of your reference. – whuber Sep 08 '21 at 18:41
  • @whuber Thank you for swift reply. The link between the approach described with the unit sphere and Epanechnikov kernel is not entirely clear me. Is the spherical representation the "optimal" kernel in dimensions greater than 1? In the CV post you describe, should the routine be followed, however without the normalization step? Sorry for my ignorance, I'm relatively new in the field. – Thomas Sep 08 '21 at 19:11
  • 1
    The text provides the normalization factor explicitly, but it doesn't matter: the algorithm obtains a unit vector from the standard $d$-variate Normal distribution and rescales it according to a quantity generated from a suitable Beta distribution. It is fast and easy to implement when you have access to a Beta random number generator or an inverse Beta integral. Otherwise, the only challenge is implementing a Beta RNG. – whuber Sep 08 '21 at 19:18
  • Just to understand it clearly: The unit vector of dimension d is generated using $s = \sqrt{d + 4}\sqrt{\text{Beta}(d/2,2)}T_d$, which is then scaled by a sample from the Epanechnikov kernel? Following some advice from the post you linked I generated said vector using $v \sim \mathcal{N}(0,1)$ and $v = v/||v||$. Here's a d=2 plot with 5000 samples https://i.imgur.com/8GSpcBS.png – Thomas Sep 08 '21 at 22:40
  • Essentially, I'm trying to uniformly generate samples for the 2d and 3d variants of the box and epanechnikov kernel. Such that the they are scaled / bounded correctly by the chosen norm (Such as here: https://kdepy.readthedocs.io/en/latest/kernels.html#some-2d-kernels), which in this case it the 2-norm. Generating a 1-D sample $W$ from the epanechnikov kernel and scaling by factor $s = \sqrt{d+4}\sqrt{\text{Beta}(d/2,2)}T_d$ with n-D vector $T_d = v/||v||$, is this correctly understood? Does a similar appoach hold true for generating samples in an n-D box kernel? @whuber – Thomas Sep 09 '21 at 08:03

0 Answers0