Why do you have to provide a variogram model when you are kriging?

Question

I am very new to spatial statistics and watching lots of tutorials,

But I don't really get why you have to provide a variogram model when you krige.

I am using the gstat package in R, and this is the example they give:

library(sp)
data(meuse)
coordinates(meuse) = ~x+y
data(meuse.grid)
str(meuse.grid)
gridded(meuse.grid) = ~x+y
m <- vgm(.59, "Sph", 874, .04)
print(m)
# ordinary kriging:
x <- krige(log(zinc)~1, meuse, meuse.grid, model = m)

Is anybody able to explain in a couple of lines why you first have to supply vgm? And how do you set the parameters?

Thank you in advance! Kasper

For *simple kriging* the estimator is BLUE only if the mean and spatial covariance are known ahead of time. In *Ordinary kriging* one estimates the variogram from the data and then does the interpolation. See the [vignette from the `gstat` R package](http://cran.r-project.org/web/packages/gstat/vignettes/gstat.pdf) of the same meuse data. — Andy W, Aug 13 '14 at 12:17
Hey Andy, thanks for your comment. I found out in the vignette that you can also krige without a variogram model. I did the following: krige(residuals~1 ,temp_plot_spatial, y, nmin=5, nmax=10), so krige with just looking at minimum 5 neighbours and maximum 10. Does this make any sense at all? The result was kind of nice: https://www.dropbox.com/s/7lxvfiyfl7ekhb4/Screenshot%202014-08-13%2015.42.56.png — Kasper, Aug 13 '14 at 14:00
I think I have a problem with modelling the variogram: what if you assume the correlation has nothing to do with distance but with nearest neighbours? — Kasper, Aug 13 '14 at 14:01
"what if you assume the correlation has nothing to do with distance but with nearest neighbours?" - thats not kriging then, it is more inline with knn classification. The code `krige(residuals~1 ,temp_plot_spatial, y, nmin=5, nmax=10)` estimates local variograms. E.g. you don't have a variogram over the entire study space, but estimate a new model for every location you are trying to predict. The local model then only grabs the nearest 10 values (since you don't specify a max distance it should always grab 10 values, so `nmin` should be superfluous). — Andy W, Aug 13 '14 at 14:57
Thanks for you comment, but what if I can't want to make the assumption that the spatial correlation is the same for all locations? — Kasper, Aug 14 '14 at 08:48
Then estimating local variograms is a logical thing to do. If they vary according to certain features including other predictors in the model is an option as well. IDW might be considered the simplest type of kriging model - so IDW should be no better than actually estimating the variogram from the data. — Andy W, Aug 14 '14 at 11:44

score 9 · Accepted Answer · edited Jun 11 '20 at 14:32

Introduction and Summary

Tobler's Law of Geography asserts

Everything is related to everything else, but near things are more related than distant things.

Kriging adopts a model of those relationships in which

"Things" are numerical values at locations on the earth's surface (or in space), usually represented as a Euclidean plane.
These numerical values are assumed to be realizations of random variables.
"Related" is expressed in terms of the means and covariances of these random variables.

(A collection of random variables associated with points in space is called a "stochastic process.") The variogram provides the information needed to compute those covariances.

What Kriging Is

Kriging specifically is the prediction of things at places where they have not been observed. To make the prediction process mathematically tractable, Kriging limits the possible formulas to be linear functions of the observed values. That makes the problem a finite one of determining what the coefficients should be. These can be found by requiring that the prediction procedure have certain properties. Intuitively, an excellent property is that the differences between the predictor and the true (but unknown) value should tend to be small: that is, the predictor should be precise. Another property which is highly touted but is more questionable is that on average the predictor should equal the true value: it should be accurate.

(The reason that insisting on perfect accuracy is questionable--but not necessarily bad--is that it usually makes any statistical procedure less precise: that is, more variable. When shooting at a target would you prefer to scatter the hits evenly around the rim and rarely hitting the center or would you accept results that are focused just next to, but not exactly on, the center? The former is accurate but imprecise while the latter is inaccurate but precise.)

These assumptions and criteria--that means and covariances are appropriate ways to quantify relatedness, that a linear prediction will work, and that the predictor should be as precise as possible subject to being perfectly accurate--lead to a system of equations that has a unique solution provided the covariances have been specified in a consistent manner. The resulting predictor is thereby called a "BLUP": Best Linear Unbiased Predictor.

Where the Variogram Comes In

Finding these equations requires operationalizing the program just described. This is done by writing down the covariances between the predictor and the observations thought of as random variables. The algebra of covariances causes the covariances among the observed values to enter into the Kriging equations, too.

At this point we reach a dead end, because those covariances are almost always unknown. After all, in most applications we have observed only one realization of each of the random variables: namely, our dataset, which constitutes just one number at each distinct location. Enter the variogram: this mathematical function tells us what the covariance between any two values ought to be. It is constrained to ensure that these covariances are "consistent" (in the sense that it will never give a set of covariances that are mathematically impossible: not all collections of numerical measures of "relatedness" will form actual covariance matrices). That is why a variogram is essential to Kriging.

References

Because the immediate question has been answered, I will stop here. Interested readers can learn how variograms are estimated and interpreted by consulting good texts such as Journel & Huijbregts' Mining Geostatistics (1978) or Isaaks & Srivastava's Applied Geostatistics (1989). (Note that the estimation process introduces two objects called "variograms": an empirical variogram derived from data and a model variogram that is fitted to it. All references to "variogram" in this answer are to the model. The call to vgm in the question returns a computer representation of a model variogram.) For a more modern approach in which variogram estimation and Kriging are appropriately combined, see Diggle & Ribeiro Jr.'s Model-based Geostatistics (2007) (which is also an extended manual for the R packages GeoR and GeoRglm).

Comments

Incidentally, whether you are using Kriging for prediction or some other algorithm, the quantitative characterization of relatedness afforded by the variogram is useful for assessing any prediction procedure. Notice that all spatial interpolation methods are predictors from this point of view--and many of them are linear predictors, such as IDW (Inverse Distance Weighted). The variogram can be used to assess the average value and dispersion (standard deviation) of any of the interpolation methods. Thus it has applicability far beyond its use in Kriging.

Thank you for this detailed answer. I ask the same question as above, what if I can't make the assumption that the spatial correlation is independent of the location? Is it correct that modelling the variogram is then not useful, as I would have to make a model of the variogram for all locations? Is it then better to use IDW? — Kasper, Aug 14 '14 at 08:50
When you cannot assume *second-order stationarity* of the process, then several options include (1) collecting multiple realizations of the process (when it varies with time); (2) estimating variograms over local subregions (when there is a lot of data); and (3) assuming a parametric model for how the variogram changes with location (as in GARCH models for 1D processes). My last comments directly address the inadvisability of falling back on something like IDW: whether or not you can *estimate* the variogram, in principle it *exists* and therefore IDW is usually suboptimal. — whuber, Aug 14 '14 at 14:14