Questions tagged [rbf-kernel]

The RBF kernel, i.e., radial-basis-function kernel, occurs in the context of kernel methods in machine learning.

A radial basis function, RBF, is a basis function. RBFs are often used for artificial neural networks using a kernel trick.

81 questions
13
votes
1 answer

Why are random Fourier features efficient?

I am trying to understand Random Features for Large-Scale Kernel Machines. In particular, I don't follow the following logic: kernel methods can be viewed as optimizing the coefficients in a weighted sum, $$ f(\mathbf{x}, \boldsymbol{\alpha}) =…
jds
  • 1,402
  • 1
  • 13
  • 24
12
votes
2 answers

Does Mercer's theorem work in reverse?

A colleague has a function $s$ and for our purposes it is a black-box. The function measures the similarity $s(a,b)$ of two objects. We know for sure that $s$ has these properties: The similarity scores are real numbers between 0 and 1,…
Sycorax
  • 76,417
  • 20
  • 189
  • 313
9
votes
1 answer

Regularized linear vs. RKHS-regression

I'm studying the difference between regularization in RKHS regression and linear regression, but I have a hard time grasping the crucial difference between the two. Given input-output pairs $(x_i,y_i)$, I want to estimate a function $f(\cdot)$ as…
8
votes
1 answer

Convergence of the Matérn covariance function to the squared exponential

The Matérn covariance function converges to the squared exponential covariance function. Many sources, amongst them the GPML book and Wikipedia, state this result. None of them provide details. I am looking for references that provide details…
5
votes
2 answers

Why use RBF kernel if less is needed?

I have seen online theorem's such as Cover's theorem Wikipedia which prove how given $p$ points in $\mathbb{R}^N$ the linear separability is almost certain as the fraction $\dfrac{p}{N}$ is kept close to $1$ (and also a little further actually). It…
5
votes
1 answer

Using Gaussian Processes to learn a function online

I would like to approximate a function $f:\mathbb{R} \to \mathbb{R}_+$ based on a set of samples. I obtain these samples online (i.e. sequentially in time). That is, at time $t$ I receive $(x_t, f(x_t))$ and I would like to update my approximation…
5
votes
1 answer

Why a large gamma in the RBF kernel of SVM leads to a wiggly decision boundary and causes over-fitting?

The hyperparameter $\gamma$ of the Gaussian/rbf kernel controls the tradeoff between error due to bias and variance in your model. If you have a very large value of gamma, then even if your two inputs are quite “similar”, the value of the …
3
votes
0 answers

How to obtain the inverse of the Gram (kernel) matrix?

We're working with a similar dual SVR problem that involves the inversion of a Gram (kernel) matrix: $\boldsymbol{S}_{i,j} = e^{ -\gamma ||\vec{x_i} - \vec{x_j}||_2^2}$ With some data-sets (e.g.: UCI ForestFires) the inversion of $\boldsymbol{S}$ is…
Filippo Portera
  • 101
  • 1
  • 6
3
votes
0 answers

In the broadest sense, what is a "kernel"?

In MCMC sampling methods, a transition kernel, as found in Metropolis(/Hastings) algorithm, is the comparison of the likelihood of the current position and the likelihood of the proposed position. However, in support vector machines and gaussian…
jbuddy_13
  • 1,578
  • 3
  • 22
3
votes
0 answers

Why SVM with gamma='scale' for RBF kernel works so well?

The intuitive explanation for the gamma parameter of the RBF kernel in SVMs is the following: Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning…
3
votes
1 answer

The inner product properties seem to clash with the RKHS property for RBF kernels. What is off?

By the reproducing kernel Hilbert space (RKHS) property, given a P.S.D. kernel function $\kappa:X\times X \rightarrow \mathbb R$, there exists a Hilbert space $H$ and a map $\phi:X\rightarrow H$ such that $$ \kappa(x,y) = \langle \phi(x), \phi(y)…
cangrejo
  • 2,121
  • 13
  • 22
3
votes
0 answers

Kernel and regularization parameter of James–Stein estimator

Consider a FIR model of the form $y= Ug_0+e$ with $e$ white noise with variance $\sigma^2$. We assume that we have collected N input-output measurements $y$ and $U$. The James–Stein estimator is defined as $$\hat{g}_{JS}=…
Betelgeuse
  • 101
  • 6
3
votes
1 answer

Calculation of nu and gamma in one-class SVM with rbf kernel

I am using python sklearn's one-class svm classifier for anomaly detection. I would like to know can I accurately calculate the required value for nu and gamma for rbf kernel. Is there any equation or theory to calculate nu and gamma according to…
nzck
  • 31
  • 1
2
votes
1 answer

Prove that the following matrix is positive definite

We define $K_{\mathbf{a}, \mathbf{b}}$ as the $n \times m$ matrix whose $ij^{th}$ entry is $\kappa(a_{i}, b_{j})$ Where, $\kappa$ is a (positive definite) kernel function. Here, $\mathbf{a}_{i}, \mathbf{b}_{j} \in \mathbb{R}^{D} \hspace{10pt}…
2
votes
1 answer

Kernel approximation with Nystroem method and usage in scikit-learn

I am planning to use the Nystroem method to approximate a Gram matrix induced by any kernel function. I found the Nystroem implementation in scikit-learn. As far as I understood, the full Gram Matrix should be estimated. Let have $x_1, \ldots, x_n$…
1
2 3 4 5 6