What does a kernel function do in English

Question

The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all and in the input space , certain functions can be expressed as an inner product in another space . The function is often referred to as a kernel or a kernel function.

To be quite honest, I have no idea what this means. But this is what I think it means:

Kernel function is a pre-processing/reducing/refining step that takes vector representations of objects into it, and spits out an easier-to-deal-with representation of it - that we can then better use in general learning algorithms

For example (following with this analogy):

If my task was to develop a "learning algorithm" to rank people by height based on photos. A good "kernel function" would be one that takes all the photos in and draws red lines from their heads to their toes (indicating their height) - So now the learning algorithm no longer needs to do all this fancy geometrical acrobatics to extract the height, but merely measure the distance of these long lines (making life simpler).

Is this the gist of what kernel functions do? (act as mathematical vector pre-processors)? I don't really understand the whole deal with:

certain functions can be expressed as an inner product in another space

Might help if you check http://stats.stackexchange.com/questions/200019/understanding-kernel-functions-for-svms — wij, Mar 04 '16 at 22:26

Aksakal · Accepted Answer · 2016-03-04T18:13:12.167

2

certain functions can be expressed as an inner product in another space

Example: say, you have a data series $IQ_i$ and $IN_i$ of intellect and income, where $i=1,\dots,n$ are student ids. Say we got a new student $i=n+1$, and we only know how much money $IN_{n+1}$ he has. We want to predict his intellect. We could use a kernel function $K_{ij}=K(IN_i,IN_j)=\kappa(|IN_i-IN_j|)$ as follows: $$IQ_{n+1}=\sum_{i=1}^n\kappa(IN_i-IN_{n+1})\cdot IQ_i=\sum_{i=1}^nK_{n+1,i} IQ_i\equiv K\cdot IQ$$

So, you can see that the predicted intellect of a new student is the inner product of the given kernel and the old students features in the space of their income. We get from the space of income to a space of intellect using the inner product with the kernel.

edited Mar 04 '16 at 18:13

answered Mar 04 '16 at 17:17

Aksakal

55,939
5
90
176

1

Can you explain exactly what you mean with your long equation? From what I am getting from it is `pretend this new guy's IQ is the (average?) difference between all the incomes of our dataset???` – AlanSTACK Mar 04 '16 at 17:21
2

@Alan, I fixed the bug in the equation, sorry. Hope this is clearer now. In English you could say "how far is this student from others in income tells us how far is he from them in intellect". The key here is the measure of the distance. You define the distance in another space, then the dot product uses it to translate it into intellect space. – Aksakal Mar 04 '16 at 18:08
In your opinion, would you say that kernel tricks, in the frame of machine learning, is just a type of input pre-processing? – AlanSTACK Mar 04 '16 at 18:39
@Alan, no, I wouldn't say it's just for input pre-processing. They can be alternatives to non-linear models. – Aksakal Mar 04 '16 at 18:44
@Alan, For instance, you could build a nonlinear model $IQ=f(IN|\theta)+\varepsilon$, then estimate its parameters $\theta$. The trouble is to come up with an appropriate function $f(.)$. Instead, with the kernel you avoid this alltogether – Aksakal Mar 04 '16 at 18:50
Although I concede to the possibility, we must admit that often, (let us take the gaussian kernel for example) is used to pre-process feature vectors before feeding into a linear learner (e.g. kNN) in order to learn non-linear separating planes. Am I not correct? (I'm just trying to verify my skimpy domain of knowledge against someone more well-versed) – AlanSTACK Mar 04 '16 at 18:55

What does a kernel function do in English

1 Answers1