2

For what values of $\beta \in \mathbb{R}$ is $t(x-x')=-||x-x'||^\beta$ a kernel?

I know that kernels of type $t(x-x')$ where $t$ is function that inverts the dissimilarity $x-x'$ into a similarity measure proportional to the kernel. In this case, by means of adding a negative sign in front.

I have tried to plot several cases via Python and I observed that with $\beta>0$ I have sort of "concave" function, if you could call it like that:

enter image description here

With $\beta=0$ it is constant -1, and with $\beta<0$ I obtain a "convex" function with a division by zero at the origin. But with all that I do not answer to the question because I see plots, but I do not know how prove by which values of $\beta$ $t$ is a valid kernel. I would really appreciate your help!

Danica
  • 21,852
  • 1
  • 59
  • 115
alienflow
  • 261
  • 2
  • 8

2 Answers2

3

A related kernel that is valid is $$ k(x, y) = \lVert x \rVert^\beta + \lVert y \rVert^\beta - \lVert x - y \rVert^\beta $$ for $0 < \beta \le 2$; see Example 15 of Sejdinovic, Sriperumbudur, Gretton, and Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics 2013.

You can also use $$ \lVert x - z \rVert^\beta + \lVert y - z \rVert^\beta - \lVert x - y \rVert^\beta $$ for any fixed $z$ if you'd like.

In general, if you want a quick "could this possibly be a kernel" numerical check, it's far more productive to construct a kernel matrix for some random inputs and check its eigenvalues: if any are negative, then it can't be a valid kernel. If you do this a few times and don't get any negatives, then maybe it is valid. (You might get some -1e-7 values out due to numerical error, but any actually negative values would disqualify it.)

Danica
  • 21,852
  • 1
  • 59
  • 115
  • While your general approach is certainly pragmatic (Full disclosure: I did this myself) I think it is much less frustrating to learn some theory and try design kernels from sound principles than to do this by trial and error. Even more so, since translation invariant kernels and their associated RKHS seem to be pretty much covered. – g g Sep 16 '19 at 20:54
  • Yeah, the numerical check is definitely for a hmm-I'd-really-like-this-particular-function-to-be-a-kernel situation, and isn't particularly useful for finding a kernel that actually *is* valid. But it's also really easy to get into a place where theory is just hard to come by, eg $\exp(- \mathrm{EMD}(\mathbb P, \mathbb Q))$ for the earth mover's/Wasserstein/Mallows metric on distributions, which I think has been an open question for decades despite always giving psd matrices in practice. – Danica Sep 16 '19 at 21:02
1

This will never work! No matter what $\beta$ you choose.

Take three distinct points in any $\mathbb{R}^n$ and the determinant of your kernel matrix $(t(x_i,x_j))$ will be negative for every $\beta$.

Background: A function $t(\|x-y\|)$ defines a positive kernel on every $\mathbb{R}^n$, iff $t:[0,\infty[\rightarrow\mathbb{R}$ is "totally monotone"; totally monotone functions are necessarily nonnegative.

For proofs see Chapters 5 and 6 of Wendland, "Scattered Data Approximation".

g g
  • 1,588
  • 1
  • 9
  • 24