I'm using the negative log marginal likelihood for hyperparameter selection in a Gaussian Process regression model. However, I'm running into a lot of cases where the negative log marginal likelihood is less than 0, which would imply that the marginal likelihood is greater than 1, and I'm not sure where I'm going wrong.
According to Chapter 5 of the Rasmussen and Williams GPML text:
$log\ p(y|X,\theta) = -\frac{1}{2}y^TK_y^{-1}y-\frac{1}{2}log|K_y|-\frac{n}{2}log2\pi$
I'm using the following code in Python to calculate the negative log marginal likelihood:
Y_mat = np.matrix(self.Y)
var_n,var_p = variances
K_mat = np.matrix (self.K)
Ky = K_mat*var_p+np.identity(len(K_mat))*var_n
L = np.linalg.cholesky (Ky)
alpha = np.linalg.lstsq(L.T,np.linalg.lstsq (L, np.matrix(Y_mat).T)[0])[0]
ML = (0.5*Y_mat*alpha + sum([math.log(l) for l in np.diag(L)]) + len(Y_mat)/2.*math.log(2*math.pi)).item()
If it matters, the complexity penalty (the second term) is the one that's causing the trouble.