What is the difference between a non-zero nugget and a noise term in Kriging/GPR?

Question

With some Gaussian Process Regression/Kriging models, it's possible to specify both a non-zero nugget, and a noise term. For example, in Scikit-learn's GPR model, there is an alpha parameter, which I think represents the nugget, and a WhiteKernel that represents noise and can be added to any other kernel.

These two components have very similar effects on the results, as far as I can see (although counter-examples could be very instructive here).

I'm wondering what the two represent. I think (after some discussion on chat) that the nugget basically represents low-distance spatial variability (e.g. variability on scales greater than zero, but smaller than the smallest distance in the dataset), where a noise term would represent uncertainty in the sampled values of each data point (so basically measurement error). Is this a correct interpretation? Can the noise term also represent other things?

Ah... I just realised what else the Noise term could represent: If you're working with averages (e.g. a gridded dataset), and there is some variance on those data points (e.g. sub-grid variance), then it would make sense to include that as noise, I think. In that case, there would be no nugget, I think. — naught101, Nov 08 '21 at 04:34
If it helps, my understanding is the same as yours. The nugget term is added to the diagonal of the cov. matrix to make sure the matrix is always positive definite *in practice* (i.e. accounting for floating point precision in the calculation), and always permits stable inversion. While the noise term behaves in a very similar way (and could accomplish the above, too), I *think* a key difference is that the noise term is usually added back on to the conditional (predictive) covariance matrix, while the nugget isn't (but should be small enough not to make an appreciable difference anyway). — rxFt20, Dec 02 '21 at 14:08

Julien Bect · Accepted Answer · 2022-02-02T08:49:35.867

Random noise and nugget effect are indeed quite similar to some extent. The difference between the two appears

when there are repeated observations (i.e., several observations at the same location), and
when you compute the predicted value at an observation point.

The random noise model assumes that observations are corrupted by additive, IID Gaussian noise. Practically, this means that repeated observations at a single location are producing differents outcomes. The posterior mean of the GP is not equal in this case to the observed value (even if there is only one observation at a particular location). This is GP regression, with a (usually) smooth regression function.

The nugget model, on the other hand, assumes a deterministic observations model (repeated observations should provide to the same value) but a very rough underlying function. The posterior mean of the GP is equal in this case to the observed value at each observation point, but is discontinuous at these points. This is in fact a form of GP interpolation, with a discontinuous interpolant.

Remark: in the first case (random noise), the individual values of the repeated observations do not matter. The posterior distribution of the GP depends only on the number of observations at each location and on their average.

Excellent distinction, thank you! I'm actually wanting to use GPR for time-mean estimates of climate data - very noisy temporally. — naught101, Feb 03 '22 at 02:57

What is the difference between a non-zero nugget and a noise term in Kriging/GPR?

1 Answers1

Linked