4

I'm trying to use a Matérn 5/2 kernel for GP regression, so my kernel function is $ K(x,x')\triangleq\theta_0(1+\sqrt{5r(x,x')}+5/3r)\exp(-\sqrt{5r}), $ where $r(x,x')\triangleq\sum_{d=1}^D (x_d-x'_d)^2/\theta_d^2$

I want to optimize the marginal likelihood--the gradient of which involves calculating $\frac{\partial K}{\partial\theta_i}.$ The problem, though is for $x=x'$, i.e., the diagonal entries of $K$, $r(\cdot)=0$, making $\frac{\partial K}{\partial\theta_i}$ undefined at the diagonal, as $\sqrt{(\cdot)}$ doesn't have a defined derivative at 0.

This seems like a very obvious problem, but google hasn't revealed anything, so maybe I'm missing something.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Mike Adriano
  • 161
  • 5
  • 4
    The lack of differentiability of the square root is not relevant, because what matters is the differentiability of $K$ (*qua* function of $\theta_d$). Any plot of $K$ as a function of $\theta_d$ will show how beautifully smooth it is as $\theta_d$ approaches zero. – whuber Dec 28 '15 at 14:44

1 Answers1

3

I think when the kernel matrix is differentiated, the kernel would be multiplied by differentiation of r(.) with respect to correlation length (θ) (i.e. K*dr(.)/dθ). At x=x', r(.) is zero, so the entries for the derivative matrix of the kernel wrt correlation length is zero.