Is there an optimal bandwidth for a kernel density estimator of derivatives?

Question

I need to estimate the density function based on a set of observations using the kernel density estimator. Based on the same set of observations, I also need to estimate the first and second derivatives of the density using the derivatives of the kernel density estimator. The bandwidth will certainly have a great effect of the final result.

First, I know there are a couple of R functions that give the KDE bandwidth. I am not sure which one is more preferred. Can anyone recommend one out of these R functions for KDE bandwidth?

Secondly, for the derivative of KDE, should I choose the same bandwidth?

For a density the choice of bandwidth is always somewhat subjective. It is a question of what is too narrow and therefore causes variation in the curve that is essentially following the noise vs too wide where the curve is too smooth and misses some real features in the curve. But you estimate the density to find out the shape. So how smooth the estimate should be is not easy to know. For derivatives I would think it depends on what feature of the derivative you want to know about. — Michael R. Chernick, Aug 08 '12 at 15:15

Rob Hyndman · Answer 1 · 2012-08-09T02:22:53.320

The optimal bandwidth for derivative estimation will be different from the bandwidth for density estimation. In general, every feature of a density has its own optimal bandwidth selector.

If your objective is to minimize mean integrated squared error (which is the usual criterion) there is nothing subjective about it. It is a matter of deriving the value that minimizes the criterion. The equations are given in Section 2.10 of Hansen (2009).

The tricky part is that the optimal bandwidth is a function of the density itself, so this solution is not directly useful. There are a number of methods around to try to deal with that problem. These usually approximate some functionals of the density using normal approximations. (Note, there is no assumption that the density itself is normal. The assumption is that some functionals of the density can be obtained assuming normality.)

Where the approximations are imposed determines how good the bandwidth selector is. The crudest approach is called the "normal reference rule" which imposes the approximation at a high level. The end of Section 2.10 in Hansen (2009) gives the formula using this approach. This approach is implemented in the hns() function from the ks package on CRAN. That's probably the best you will get if you don't want to write your own code. So you can estimate the derivative of a density as follows (using ks):

library(ks)
h <- hns(x,deriv.order=1)
den <- kdde(x, h=h, deriv.order=1)

A better approach, usually known as a "direct plug in" selector, imposes the approximation at a lower level. For straight density estimation, this is the Sheather-Jones method, implemented in R using density(x,bw="SJ"). However, I don't think there is a similar facility available in any R package for derivative estimation.

Rather than use straight kernel estimation, you may be better off with a local polynomial estimator. This can be done using the locpoly() function from the ks package in R. Again, there is no optimal bandwidth selection implemented, but the bias will be smaller than for kernel estimators. e.g.,

den2 <- locpoly(x, bandwidth=?, drv=1) # Need to guess a sensible bandwidth

Thanks a million, Rob. I will likely use SJ bandwidth for density estimation. — user13154, Aug 09 '12 at 00:31
I checked the formula given at the end of Section 2.10 in Hansen (2009). It looks like the bandwidth depends on the order the derivative, say rth derivative. h — user13154, Aug 09 '12 at 01:02
(1) No. They are not using cross-validation. See the Hansen article. (2) Use approx(). — Rob Hyndman, Aug 09 '12 at 22:20

Is there an optimal bandwidth for a kernel density estimator of derivatives?

1 Answers1

Linked