Why must kernel functions be scalar products

Question

I'm currently reading Bishop's Pattern Recognition and Machine Learning. In the chapter on kernel methods, he's very clear that kernels must be "valid", that is: be representable as scalar products in some feature space (no matter what that might actually be).

Why is this scalar product criterion so important? Why is it invalid to just define the kernel as some arbitrary, non-product distance function of its arguments?

If the suggested duplicate indeed answers my question, I don't understand it. Why **must** kernel function be inner products? Carlosdc's answer takes that as a pre-condition, and does not further motivate it, AIUI. "You can compute scalar products easier than doing explicit feature mapping" certainly has appeal, but I can compute many other functions efficiently as well, scalar product or not. — Christian Aichinger, Jun 20 '16 at 19:48
I don't think that's exactly the same question, the possible duplicates asks the definition of a (valid) kernel, and this questions asks why the kernels are defined this way instead of others. — dontloo, Jun 21 '16 at 02:58

dontloo · Answer 1 · 2016-06-21T02:44:57.997

4

AFAIK the kernel trick is only applied when the data only appear in the form of scalar products like $x_1'x_2$. For many problems we need the dual representation in such forms so that the kernel trick can be applied.

If the kernel can be written as scalar products in some feature space $k(x_1, x_2)=\phi(x_1)'\phi(x_2)$, then applying the kernel trick we are actually solving the same problem but in another feature space.

But if the kernel can not be written as scalar products, we can't be sure whether adding kernels into it would still be the problem we want to solve in the first place.

edited Jun 21 '16 at 02:44

answered Jun 20 '16 at 14:45

dontloo

13,692
7
51
80

Can we be sure change the scalar product to be over some arbitrary feature space still solves the problem we want to solve in the first place? I understand the kernel trick (inner product->kernel), but I'm grappling with the inverse - why are valid kernels constrained to be inner products? – Christian Aichinger Jun 20 '16 at 19:42
@ChristianAichinger I think so, if it satisfies that the data only appears in scalar products, and that's what often called the dual problem. Since the dual problem only works with inner products I guess it's important that kernels can be decomposed to inner products, so that it's equivalent to solving the same problem in another feature space. – dontloo Jun 21 '16 at 02:51

Why must kernel functions be scalar products

1 Answers1