Adaptive kernel density estimators?

Question

Can anyone report on their experience with an adaptive kernel density estimator?
(There are many synonyms: adaptive | variable | variable-width, KDE | histogram | interpolator ...)

Variable kernel density estimation says "we vary the width of the kernel in different regions of the sample space. There are two methods ..." actually, more: neighbors within some radius, KNN nearest neighbors (K usually fixed), Kd trees, multigrid...
Of course no single method can do everything, but adaptive methods look attractive.
See for example the nice picture of an adaptive 2d mesh in Finite element method.

I'd like to hear what worked / what didn't work for real data, especially >= 100k scattered data points in 2d or 3d.

Added 2 Nov: here's a plot of a "clumpy" density (piecewise x^2 * y^2), a nearest-neighbor estimate, and Gaussian KDE with Scott's factor. While one (1) example doesn't prove anything, it does show that NN can fit sharp hills reasonably well (and, using KD trees, is fast in 2d, 3d ...) alt text

Can you give alittle more context as to what you mean by "what works" or the particular goals of your project at hand. I've used them for visualizing spatial point processes but I doubt that is what you had in mind when asking this question. — Andy W, Oct 14 '10 at 01:11

score 7 · Answer 1 · answered Oct 14 '10 at 06:52

The article * D. G. Terrell; D. W. Scott (1992). "Variable kernel density estimation". Annals of Statistics 20: 1236–1265.* cited at the end of the Wikipedia article you yourself cite clearly states that unless the observations space is very sparse the variable kernel method is not recommended on the basis of global root mean squared error (both local and global) for Gaussian distributed random variables: (through theoretical arguments) they cite the figures of $n\leq 450$ ($n$ is the sample size) and (through bootstrapping results) $p\geq 4$ ($p$ is the number of dimension) as the settings in which variable kernel method become competitive with fixed width ones (judging from your question you are not in these settings).

The intuition behind these results is that if you are not in very sparse settings, then, the local density simply does not vary enough for the gain in bias to outdo the loss in efficiency (and hence the AMISE of variable width kernel increases relative to the AMISE of fixed width). Also, given the large sample size you have (and the small dimensions) the fixed width kernel will be very local already, diminishing any potential gains in terms of bias.

Thanks Kwak. "... for Gaussian distributed random variables"; would you know of newer work for "clumpy" distributions ? — denis, Oct 14 '10 at 14:38
@Denis:> 'Clumpy'=?concentrated=?with narrower tails than the gaussian? — user603, Oct 14 '10 at 18:28
I'm no expert, but like "data set clumpiness" in the paper Lang et al., "Insights on fast Kernel Density Estimation algorithms", 2004, 8p — denis, Oct 15 '10 at 10:15
@Denis:> i would say it makes the problem worst (i.e. NN kernel should work better on less clumpy data). I have an intuitive explanation but it won't fit here, plus you may want to asks this out on the main board as a separate question (linking to this one) to have additional opinions. — user603, Oct 20 '10 at 13:37

score 0 · Answer 2 · edited Nov 26 '11 at 20:26

The paper

Maxim V. Shapovalov, Roland L. Dunbrack Jr., A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions, Structure, Volume 19, Issue 6, 8 June 2011, Pages 844-858, ISSN 0969-2126, 10.1016/j.str.2011.03.019.

uses adaptive kernel density estimation in order to make their density estimation smooth in regions where the data is sparse.

score -1 · Answer 3 · answered Oct 14 '10 at 01:16

-1

Loess/lowess is basically a variable KDE method, with the width of the kernel being set by the nearest-neighbour approach. I've found that it works pretty well, certainly much better than any fixed-width model when the density of data points varies markedly.

One thing to be aware of with KDE and multi-dimensional data is the curse of dimensionality. Other things being equal, there are far fewer points within a set radius when p ~ 10, than when p ~ 2. This may not be a problem for you if you only have 3d data, but it's something to keep in mind.

answered Oct 14 '10 at 01:16

Hong Ooi

7,629
3
29
52

3

Loess is a variable kernel REGRESSION method. The question asked about variable kernel DENSITY estimation. – Rob Hyndman Oct 14 '10 at 02:04
Oops, you're right. Misread the question. – Hong Ooi Oct 14 '10 at 06:56
@Rob, excuse my naive questions: if varying kernel width is (sometimes) good for local regression / Kernel smoothing, why is it bad for density estimation ? Isn't density estimation a case of f() estimation for f() == density() ? – denis Oct 14 '10 at 15:13
@Hong Ooi, how many points in what Ndim have you used ? Thanks – denis Oct 14 '10 at 15:15
@Denis. Great question. Can you please add it as a proper question on the site and we'll see what answers people can come up with. – Rob Hyndman Oct 14 '10 at 20:58
@Rob, which part, can you help me formulate ? along the lines "Is adaptive kernel width good for both kernel smoothing and KDE ?" – denis Oct 15 '10 at 10:15
@Denis. I meant the question you already asked, viz., "If variable kernel widths are often good for kernel regression, why are they generally not good for kernel density estimation?" I would post it myself, but then you would miss out on the rep points. – Rob Hyndman Oct 16 '10 at 03:56
@Rob:> given that three days have elapsed, i think for the benefit of the wider community, you should consider posting this as a separate question. – user603 Oct 19 '10 at 10:55
@kwak. OK. I've posted something at http://stats.stackexchange.com/questions/3752/ – Rob Hyndman Oct 19 '10 at 11:35
Thanks @Rob @Kwak -- I got hung up noodling KDE variants. (Is 3 days a statisticians' limit :) – denis Oct 19 '10 at 16:47

Adaptive kernel density estimators?

3 Answers3

Linked