What is a "kernel" in plain English?

Question

There are several distinct usages:

kernel density estimation
kernel trick
kernel smoothing

Please explain what the "kernel" in them means, in plain English, in your own words.

Not to be rude, but isn't this a question that is already answered ad nausea on Wikipedia and the likes? Google gave me the answer within 15 seconds... — Joris Meys, Sep 09 '10 at 08:32
I absolutely hate wikipedia answers for stats. There are rambling, symbolic messes. I am looking for a gem of an answer that can explain the answer in plain English, as I believe that that shows a deeper level of understanding than a math equation. There are many popular "plain English" questions on here, and for good reason. — Neil McGuigan, Sep 09 '10 at 18:04

score 46 · Answer 1 · edited Jul 30 '18 at 10:14

There appear to be at least two different meanings of "kernel": one more commonly used in statistics; the other in machine learning.

In statistics "kernel" is most commonly used to refer to kernel density estimation and kernel smoothing.

A straightforward explanation of kernels in density estimation can be found (here).

In machine learning "kernel" is usually used to refer to the kernel trick, a method of using a linear classifier to solve a non-linear problem "by mapping the original non-linear observations into a higher-dimensional space".

A simple visualisation might be to imagine that all of class $0$ are within radius $r$ of the origin in an x, y plane (class $0$: $x^2 + y^2 < r^2$); and all of class $1$ are beyond radius $r$ in that plane (class $1$: $x^2 + y^2 > r^2$). No linear separator is possible, but clearly a circle of radius $r$ will perfectly separate the data. We can transform the data into three dimensional space by calculating three new variables $x^2$, $y^2$ and $\sqrt{2}xy$. The two classes will now be separable by a plane in this 3 dimensional space. The equation of that optimally separating hyperplane where $z_1 = x^2, z_2 = y^2$ and $z_3 = \sqrt{2}xy$ is $z_1 + z_2 = 1$, and in this case omits $z_3$. (If the circle is off-set from the origin, the optimal separating hyperplane will vary in $z_3$ as well.) The kernel is the mapping function which calculates the value of the 2-dimensional data in 3-dimensional space.

In mathematics, there are other uses of "kernels", but these seem to be the main ones in statistics.

Very nice! I'm going to use your example with the circle to explain kernel methods, as it is the best visualization I met up til now. Thanks! — Joris Meys, Sep 10 '10 at 13:47
The following video was proposed by an anonymous potential editor as "a great visualisation of what Thylacoleo explained:" http://www.youtube.com/watch?v=3liCbRZPrZA — gung - Reinstate Monica, Jun 06 '13 at 22:56
Following up Thylacoleo's example using the circle to explain the kernel trick (i don't have enough reputation to add a comment directly to his answer) Was there a simple typo in the equation for the separating hyperplane? and it should be z1 + z2 = r^2, instead of z1 + z2 = 1? Or do I misunderstand? I agree its a nice simple example to illustrate the concept. Thanks. Though the definition of z3 still seems a bit of a mystery, but apparently it doesn't matter for the example centered at the origin. — Alex Blakemore, Sep 11 '10 at 01:37
Yes there was a typo. Thanks for that Alex. I don't always proofread :-) — Thylacoleo, Sep 13 '10 at 07:26
Do we use inner products t o map 2-dimensional data to 3-dimensional? — SmallChess, Nov 09 '15 at 03:45

score 41 · Accepted Answer · edited Apr 23 '15 at 06:19

41

In both statistics (kernel density estimation or kernel smoothing) and machine learning (kernel methods) literature, kernel is used as a measure of similarity. In particular, the kernel function $k(x,.)$ defines the distribution of similarities of points around a given point $x$. $k(x,y)$ denotes the similarity of point $x$ with another given point $y$.

edited Apr 23 '15 at 06:19

Tim

108,699
20
212
390

answered Sep 09 '10 at 06:09

ebony1

2,143
21
13

This is a nice way of putting it. I am wondering if you can generalize this description to also apply to the kernel of 'kernel density estimation'. – shabbychef Sep 09 '10 at 16:18
2

In a way, yes. One way to understand kernel density estimation is that you approximate the density of a point from some distribution as a weighted average of its similarities with a set of points from the distribution. So the notion of similarity does play a role here as well. – ebony1 Sep 09 '10 at 17:14
1

I understand "kernel" in statistics to be borrowed originally from jargon used in discussion of integral equations. – Nick Cox Sep 16 '14 at 14:47

What is a "kernel" in plain English?

2 Answers2

Linked

Related