1

Consider the kernel methods in machine learning that are used in Support Vector Machine, Gaussian Process, etc. We need to define $k(x;y)$ that measures the similarity between two data points $x;y$. The common choice is the radial basis function RBF kernel: $k(x;y)=exp(- || x-y ||^2)$ .

In most of the cases, $k(x;x)$ is constant and does not depend on $x$. I am wondering if there is any kernel function such that $k(x;x)$ is dependent on $x$. If there is, how is it called? Any reference is helpful.

Carl
  • 11,532
  • 7
  • 45
  • 102

2 Answers2

5

One easy example is the linear kernel $k(x, y) = x^T y$, so that $k(x, x) = \lVert x \rVert^2$, or the polynomial kernel $k(x, y) = (x^T y + c)^d$.

In general, this kind of kernel is known as nonstationary: its values vary over the input space. There are an infinite number of them, but I'm not aware of any in common usage other than the two I just listed.

There's an interesting subclass of these kernels discussed by:

Genton (2001). Classes of Kernels for Machine Learning: A Statistics Perspective. Journal of Machine Learning Research 2 299–312. (official pdf)

locally stationary kernels, originated perhaps by Silverman, of the form $$ K(x, y) = K_1\left( \frac{x + y}{2} \right) K_2\left( x - y \right) ,$$ so that the normal stationary kernel $K_2$ is scaled by a function of the average location $K_1$.

The following paper I haven't read discusses some other options:

Paciorek and Schervish. Nonstationary Covariance Functions for Gaussian Process Regression. NIPS 2004. (official pdf)

Danica
  • 21,852
  • 1
  • 59
  • 115
  • Hi @Dougal, I asked a question [here](http://stats.stackexchange.com/questions/253333/reducible-nonstationary-kernels) about the Genton paper you cited. I would like to hear your opinion on it if you have some time. Thanks. – jkt Dec 30 '16 at 08:24
0

Not a real kernel but the sigmoid kernel has this behavior. It's the hyperbolic tangent of the dot product of two numbers combined with a slope term and a constant term. From the link: $K(X, Y) = \tanh(γ\cdot X^TY + r)$

www3
  • 601
  • 8
  • 16