Questions tagged [kernel-smoothing]

Kernel smoothing techniques, such as kernel density estimation (KDE) and Nadaraya-Watson kernel regression, estimate functions by local interpolation from data points. Not to be confused with [kernel-trick], for the kernels used e.g. in SVMs.

A kernel in the context of kernel smoothing is a local similarity function $K$, which must integrate to 1 and is typically symmetric and nonnegative. Kernel smoothing uses these functions to interpolate observed data points into a smooth function.

For example, Watson-Nadaraya kernel regression estimates a function $f : \mathcal X \to \mathbb R$ based on observations $\{ (x_i, y_i) \}_{i=1}^n$ by $$ \hat{f}(x) = \frac{\sum_{i=1}^n K(x, x_i) \, y_i}{\sum_{i=1}^n K(x, x_i)} ,$$ i.e. a mean of the observed data points weighted by their similarity to the test point.

Kernel density estimation estimates a density function $\hat{p}$ from samples $\{ x_i \}_{i=1}^n$ by $$ \hat{p}(x) = \frac{1}{n} \sum_{i=1}^n K(x, x_i) ,$$ essentially placing density "bumps" at each observed data point.

The choice of kernel function is of theoretical importance but typically does not matter much in practice for estimation quality. (Wikipedia has a table of the most common choices.) Rather, the important practical problem for kernel smoothing methods is that of bandwidth selection: choosing the scale of the kernel function. Undersmoothing or oversmoothing can result in extremely poor estimates, and so care must be taken to choose an appropriate bandwidth, often via cross-validation.


Note that the word "kernel" is also used to refer to the kernel of a reproducing kernel Hilbert space, as in the "kernel trick" common in support vector machines and other kernel methods. See [kernel-trick] for this usage.

575 questions
80
votes
2 answers

What is a "kernel" in plain English?

There are several distinct usages: kernel density estimation kernel trick kernel smoothing Please explain what the "kernel" in them means, in plain English, in your own words.
Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
41
votes
4 answers

Good methods for density plots of non-negative variables in R?

plot(density(rexp(100)) Obviously all density to the left of zero represents bias. I'm looking to summarize some data for non-statisticians, and I want to avoid questions about why non-negative data has density to the left of zero. The plots are…
generic_user
  • 11,981
  • 8
  • 40
  • 63
36
votes
2 answers

Can you explain Parzen window (kernel) density estimation in layman's terms?

Parzen window density estimation is described as $$ p(x)=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{h^2} \phi \left(\frac{x_i - x}{h} \right) $$ where $n$ is number of elements in the vector, $x$ is a vector, $p(x)$ is a probability density of $x$, $h$ is…
31
votes
1 answer

"Kernel density estimation" is a convolution of what?

I am trying to get a better understanding of kernel density estimation. Using the definition from Wikipedia: https://en.wikipedia.org/wiki/Kernel_density_estimation#Definition $ \hat{f_h}(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) \quad =…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
29
votes
2 answers

Choosing a bandwidth for kernel density estimators

For univariate kernel density estimators (KDE), I use Silverman's rule for calculating $h$: \begin{equation} 0.9 \min(sd, IQR/1.34)\times n^{-0.2} \end{equation} What are the standard rules for multivariate KDE (assuming a Normal kernel).
csgillespie
  • 11,849
  • 9
  • 56
  • 85
27
votes
2 answers

If the Epanechnikov kernel is theoretically optimal when doing Kernel Density Estimation, why isn't it more commonly used?

I have read (for example, here) that the Epanechnikov kernel is optimal, at least in a theoretical sense, when doing kernel density estimation. If this is true, then why does the Gaussian show up so frequently as the default kernel, or in many…
John Rauser
  • 371
  • 1
  • 3
  • 5
26
votes
1 answer

What does the y axis in a kernel density plot mean?

Possible Duplicate: Probability distribution value exceeding 1 is OK? I thought the area under the curve of a density function represents the probability of getting an x value between a range of x values, but then how can the y-axis be greater…
nachocab
  • 505
  • 1
  • 4
  • 10
19
votes
2 answers

If variable kernel widths are often good for kernel regression, why are they generally not good for kernel density estimation?

This question is prompted by discussion elsewhere. Variable kernels are often used in local regression. For example, loess is widely used and works well as a regression smoother, and is based on a kernel of variable width that adapts to data…
Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
18
votes
4 answers

How to calculate overlap between empirical probability densities?

I'm looking for a method to calculate the area of overlap between two kernel density estimates in R, as a measure of similarity between two samples. To clarify, in the following example, I would need to quantify the area of the purplish overlapping…
mmk
  • 455
  • 1
  • 3
  • 11
18
votes
1 answer

Kernel Bandwidth: Scott's vs. Silverman's rules

Could anyone explain in plain English what the difference is between Scott's and Silverman's rules of thumb for bandwidth selection? Specifically, when is one better than the other? Is it related to the underlying distribution? Number of…
xrfang
  • 293
  • 1
  • 2
  • 9
17
votes
1 answer

What is the long run variance?

How is long run variance in the realm of time series analysis defined? I understand it is utilized in the case there is a correlation structure in the data. So our stochastic process would not be a family of $X_1, X_2 \dots$ i.i.d. random variables…
Monolite
  • 1,141
  • 3
  • 13
  • 24
16
votes
1 answer

How to draw random samples from a non-parametric estimated distribution?

I have a sample of 100 points which are continuous and one-dimensional. I estimated its non-parametric density using kernel methods. How can I draw random samples from this estimated distribution?
lovekesh
  • 459
  • 5
  • 16
15
votes
3 answers

Where is density estimation useful?

After going through some slightly terse mathematics, I think I have a slight intuition of kernel density estimation. But I am also aware that estimating multivariate density for more than three variables might not be a good idea, in terms of the…
15
votes
2 answers

Area under the "pdf" in kernel density estimation in R

I am trying to use the 'density' function in R to do kernel density estimates. I am having some difficulty interpreting the results and comparing various datasets as it seems the area under the curve is not necessarily 1. For any probability density…
highBandWidth
  • 2,092
  • 2
  • 21
  • 34
14
votes
1 answer

Is there an optimal bandwidth for a kernel density estimator of derivatives?

I need to estimate the density function based on a set of observations using the kernel density estimator. Based on the same set of observations, I also need to estimate the first and second derivatives of the density using the derivatives of the…
user13154
  • 793
  • 1
  • 5
  • 15
1
2 3
38 39