Linear combination of two kernel functions

Question

How can I prove that linear combination of two kernel functions is also a kernel function?

\begin{align} k_{p}( x, y) = a_1k_1( x, y) + a_2k_2(x,y) \end{align}

given $k_1(,)$ and $k_2(,)$ are valid kernel functions.

In general to prove any such results involving dot product , cascading.. etc. , what methodology can be followd to prove RHS is a kernel function given k's in LHS all are kernel?

Only linear combinations with positive coefficients are always valid. — Marc Claesen, Oct 15 '15 at 17:34
@MarcClaesen Just to clarify a minor point: only positive linear combinations are *necessarily* valid. It's possible for one with some negative coefficients to still be psd; this arises in multiple kernel learning problems, where the optimization becomes a semidefinite program. — Danica, Oct 15 '15 at 17:36
I recommend these lecture notes http://alex.smola.org/teaching/cmu2013-10-701x/slides/recitation_3_kernel_properties_convexity.pdf and the book http://www.kernel-methods.net/ for more detailed explanations. — JStrahl, Sep 20 '16 at 12:59

Marc Claesen · Accepted Answer · 2015-10-15T17:39:50.440

A necessary and sufficient condition for a function $\kappa(\cdot,\cdot)$ to be expressible as an inner product in some feature space $\mathcal{F}$ is a weak form of Mercer's condition, namely that:

$$ \int_\mathbf{x} \int_\mathbf{y} \kappa(\mathbf{x},\mathbf{y})g(\mathbf{x})g(\mathbf{y})d\mathbf{x}d\mathbf{y} \geq 0, $$ for all square, integrable functions $g(\cdot)$ [1,2].

In your case, this reduces to the following: $$ \begin{align} &\int_\mathbf{x} \int_\mathbf{y} \big(a_1\kappa_1(\mathbf{x},\mathbf{y}) + a_2 \kappa_2(\mathbf{x},\mathbf{y})\big)g(\mathbf{x})g(\mathbf{y})d\mathbf{x}d\mathbf{y} \\ &= a_1 \underbrace{\int_\mathbf{x} \int_\mathbf{y} \kappa_1(\mathbf{x},\mathbf{y})g(\mathbf{x})g(\mathbf{y})d\mathbf{x}d\mathbf{y}}_{\geq 0} + a_2 \underbrace{\int_\mathbf{x} \int_\mathbf{y}\kappa_2(\mathbf{x},\mathbf{y})g(\mathbf{x})g(\mathbf{y})d\mathbf{x}d\mathbf{y}}_{\geq 0} \geq 0. \end{align} $$ Since $\kappa_1(\cdot,\cdot)$ and $\kappa_2(\cdot,\cdot)$ are given to be kernel functions, their integrals both satisfy Mercer's condition. Finally, if $a_1 \geq 0$ and $a_2 \geq 0$, then the overall integral is guaranteed to satisfy it too. $\blacksquare$

Note that, as @Dougal correctly pointed out, it is still possible to get a valid kernel function with negative $a_1$ or $a_2$ (not both), but that depends on several factors.

[1] Vladimir N. Vapnik. Statistical learning theory. Wiley, 1 edition, September 1998.

[2] Richard Courant and David Hilbert. Methods of Mathematical Physics, volume 1. Interscience Publishers, Inc., New York, NY, 1953

score 7 · Answer 2 · edited Apr 13 '17 at 12:44

7

As an alternative approach to Marc's:

A symmetric function $k : \mathcal X \times \mathcal X \to \mathbb R$ is a kernel function iff there is some "feature map" $\varphi : \mathcal X \to \mathcal H$ such that $k(x, y) = \langle \varphi(x), \varphi(y) \rangle_{\mathcal H}$, where $\mathcal H$ is a Hilbert space.

Let $\varphi_i$ be the feature map for $k_i$, and $\mathcal H_i$ its Hilbert space.

Now, $\mathcal H_p := \mathcal H_1 \oplus \mathcal H_2$ is a Hilbert space, and $\varphi_p := \sqrt{a_1} \varphi_1 \oplus \sqrt{a_2} \varphi_2$ is a feature map from $\mathcal X$ to it as long as $a_1, a_2 \ge 0$. (For finite-dimensional feature spaces, this is just concatenating the feature maps together.) Note that \begin{align} \langle \varphi_p(x), \varphi_p(y) \rangle_{\mathcal H_p} &= \langle \sqrt{a_1} \varphi_1(x) \oplus \sqrt{a_2} \varphi_2(x), \sqrt{a_1} \varphi_1(y) \oplus \sqrt{a_2} \varphi_2(x) \rangle_{\mathcal H_1 \oplus \mathcal H_2} \\&= a_1 \langle \varphi_1(x), \varphi_1(y) \rangle_{\mathcal H_1} + a_2 \langle \varphi_2(x), \varphi_2(y) \rangle_{\mathcal H_2} \\&= k_p(x, y) ,\end{align} so $k_p$ has feature map $\varphi_p$, and is therefore a valid kernel.

To your "in general" question at the end: if you want to prove them for arbitrary kernels, the two main techniques are the one Marc used and the one I used. Often, though, for Marc's approach we directly use the Mercer condition rather than the integral form, which can be easier to reason about:

A symmetric function $k : \mathcal X \times \mathcal X \to \mathcal R$ is positive semidefinite if and only if for all $M$, all $x_1, \dots, x_M \in \mathcal X$, and all $c_1, \dots, c_M \in \mathbb R$, $\sum_{i=1}^M \sum_{j=1}^M c_i k(x_i, x_j) c_j \ge 0$.

We can also use the following equivalent form:

A symmetric function $k : \mathcal X \times \mathcal X \to \mathcal R$ is positive semidefinite if and only if for all $M$, all $x_1, \dots, x_M \in \mathcal X$, the matrix $K$ with $K_{ij} = k(x_i, x_j)$ is positive semidefinite.

I previously gave brief proofs for several such properties in this answer.

edited Apr 13 '17 at 12:44

Community

1

answered Oct 15 '15 at 17:46

Danica

21,852
1
59
115

Does having a feature mapping of form $a(x)^Ta(x)$ a sufficient condition for it to be valid kernel? – tusharfloyd Oct 15 '15 at 20:08
1

@tusharfloyd Yes; the properties of inner products guarantee Mercer's condition. – Danica Oct 15 '15 at 20:11
Is it true because an inner product k(i,j) will always produce a positive semi-definite K (Gram Matrix) – tusharfloyd Oct 15 '15 at 20:15
@tusharfloyd Yep, exactly. – Danica Oct 15 '15 at 20:15
thanks. Just one last question I have, Isn't symmetry also a required condition for a valid kernel. Or is it trivial to prove so it is ignored. Can you direct me to some link or reference where I can find proof or validation for my second comment on why inner product gives a positive semi definite. I am still not able to wrap my head around all these axioms – tusharfloyd Oct 15 '15 at 20:28
Whoops, you're right about the symmetry; that's implied by the complex version of this condition, but not the real one (see e.g. [this discussion](https://en.m.wikipedia.org/wiki/Positive-definite_matrix#Consistency_between_real_and_complex_definitions)). Will fix. – Danica Oct 15 '15 at 20:35
2

For inner product being psd: $\sum_{i,j} c_i \langle x_i, x_j \rangle c_j = \langle \sum_i c_i x_i, \sum_i c_i x_i \rangle \ge 0$ by the properties of inner products. – Danica Oct 15 '15 at 20:38
My understanding of Mercer's conditions for a function to be a kernel is that (1) the kernel function is symmetric and (2) the kernel matrix is p.s.d. If $k_1$ and $k_2$ are p.s.d., isn't it sufficient to observe that the sum of two p.s.d. matrices is itself p.s.d.? – Sycorax Oct 22 '15 at 19:09
@user777 Yep, that's effectively using the last condition I cite here, and amounts to the same thing. – Danica Oct 22 '15 at 19:11

Linear combination of two kernel functions

2 Answers2

Linked

Related