The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all and in the input space , certain functions can be expressed as an inner product in another space . The function is often referred to as a kernel or a kernel function.
To be quite honest, I have no idea what this means. But this is what I think it means:
Kernel function is a pre-processing/reducing/refining step that takes vector representations of objects into it, and spits out an easier-to-deal-with representation of it - that we can then better use in general learning algorithms
For example (following with this analogy):
If my task was to develop a "learning algorithm" to rank people by height based on photos. A good "kernel function" would be one that takes all the photos in and draws red lines from their heads to their toes (indicating their height) - So now the learning algorithm no longer needs to do all this fancy geometrical acrobatics to extract the height, but merely measure the distance of these long lines (making life simpler).
Is this the gist of what kernel functions do? (act as mathematical vector pre-processors)? I don't really understand the whole deal with:
certain functions can be expressed as an inner product in another space