9

I was reading this wikipedia article related to kriging . I didn't understand the part when it says that

Kriging computes the best linear unbiased estimator, $\hat Z (x_0)$, of $Z(x_0)$ such that kriging variance of is minimized with the unbiasedness condition. I didn't get the derivation and also how the variance is minimized. Any suggestions?

Specially, I didn't get the part where applies minimized subject to unbiasedness condition.

I think it should have been

E[Z'(x0)-Z(x0)] instead of E[Z'(x)-Z(x)] isn't it. ' is equivalent to hat in the wiki article. Also I didn't get how the kriging error is derived

user31820
  • 1,351
  • 3
  • 20
  • 29
  • Where do you get hung up in the derivation? – whuber Jun 18 '12 at 21:23
  • The part where it calculates the kriging error and imposes the unbiasedness condition. It is fine to say that unbiased condition means the expectation of the estimator and the true one is equal. I have edited the post to include the details. – user31820 Jun 19 '12 at 16:23
  • I think you are correct that the Wikipedia expression should read $E[Z'(x_0)-Z(x_0)]$. – whuber Jun 19 '12 at 17:56

2 Answers2

14

Suppose $\left(Z_0, Z_1, \ldots, Z_n\right)$ is a vector assumed to have a multivariate distribution of unknown mean $(\mu, \mu, \ldots, \mu)$ and known variance-covariance matrix $\Sigma$. We observe $\left(z_1, z_2, \ldots, z_n\right)$ from this distribution and wish to predict $z_0$ from this information using an unbiased linear predictor:

  • Linear means the prediction must take the form $\hat{z_0} = \lambda_1 z_1 + \lambda_2 z_2 + \cdots + \lambda_n z_n$ for coefficients $\lambda_i$ to be determined. These coefficients can depend at most on what is known in advance: namely, the entries of $\Sigma$.

This predictor can also be considered a random variable $\hat{Z_0} = \lambda_1 Z_1 + \lambda_2 Z_2 + \cdots + \lambda_n Z_n$.

  • Unbiased means the expectation of $\hat{Z_0}$ equals its (unknown) mean $\mu$.

Writing things out gives some information about the coefficients:

$$\eqalign{ \mu &= E[\hat{Z_0}] = E[\lambda_1 Z_1 + \lambda_2 Z_2 + \cdots + \lambda_n Z_n] \\ &= \lambda_1 E[Z_1] + \lambda_2 E[Z_2] + \cdots + \lambda_n E[Z_n] \\ &= \lambda_1 \mu + \cdots + \lambda_n \mu \\ &= \left(\lambda_1 + \cdots + \lambda_n\right) \mu. \\ }$$

The second line is due to linearity of expectation and all the rest is simple algebra. Because this procedure is suppose to work regardless of the value of $\mu$, evidently the coefficients have to sum to unity. Writing the coefficients in vector notation $\lambda = (\lambda_i)'$, this can be neatly written $\mathbf{1}\lambda=1$.

Among the set of all such unbiased linear predictors, we seek one that deviates as little from the real value as possible, measured in the room mean square. This, again, is a computation. It relies on the bilinearity and symmetry of covariance, whose application is responsible for the summations in the second line:

$$\eqalign{ E[(\hat{Z_0} - Z_0)^2] &= E[(\lambda_1 Z_1 + \lambda_2 Z_2 + \cdots + \lambda_n Z_n - Z_0)^2] \\ &= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j \text{var}[Z_i, Z_j]-2\sum_{i=1}^n\lambda_i \text{var}[Z_i, Z_0] + \text{var}[Z_0, Z_0] \\ &= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j \Sigma_{i,j} - 2\sum_{i=1}^n\lambda_i\Sigma_{0,i} + \Sigma_{0,0}. }$$

Whence the coefficients can be obtained by minimizing this quadratic form subject to the (linear) constraint $\mathbf{1}\lambda=1$. This is readily solved using the method of Lagrange multipliers, yielding a linear system of equations, the "Kriging equations."

In the application, $Z$ is a spatial stochastic process ("random field"). This means that for any given set of fixed (not random) locations $\mathbf{x_0}, \ldots, \mathbf{x_n}$, the vector of values of $Z$ at those locations, $\left(Z(\mathbf{x_0}), \ldots, Z(\mathbf{x_n})\right)$ is random with some kind of a multivariate distribution. Write $Z_i = Z(\mathbf{x_i})$ and apply the foregoing analysis, assuming the means of the process at all $n+1$ locations $\mathbf{x_i}$ are the same and assuming the covariance matrix of the process values at these $n+1$ locations is known with certainty.

Let's interpret this. Under the assumptions (including constant mean and known covariance), the coefficients determine the minimum variance attainable by any linear estimator. Let's call this variance $\sigma_{OK}^2$ ("OK" is for "ordinary kriging"). It depends solely on the matrix $\Sigma$. It tells us that if we were to repeatedly sample from $\left(Z_0, \ldots, Z_n\right)$ and use these coefficients to predict the $z_0$ values from the remaining values each time, then

  1. On the average our predictions would be correct.

  2. Typically, our predictions of the $z_0$ would deviate about $\sigma_{OK}$ from the actual values of the $z_0$.

Much more needs to be said before this can be applied to practical situations like estimating a surface from punctual data: we need additional assumptions about how the statistical characteristics of the spatial process vary from one location to another and from one realization to another (even though, in practice, usually only one realization will ever be available). But this exposition should be enough to follow how the search for a "Best" Unbiased Linear Predictor ("BLUP") leads straightforwardly to a system of linear equations.


By the way, kriging as usually practiced is not quite the same as least squares estimation, because $\Sigma$ is estimated in a preliminary procedure (known as "variography") using the same data. That is contrary to the assumptions of this derivation, which assumed $\Sigma$ was known (and a fortiori independent of the data). Thus, at the very outset, kriging has some conceptual and statistical flaws built into it. Thoughtful practitioners have always been aware of this and found various creative ways to (try to) justify the inconsistencies. (Having lots of data can really help.) Procedures now exist for simultaneously estimating $\Sigma$ and predicting a collection of values at unknown locations. They require slightly stronger assumptions (multivariate normality) in order to accomplish this feat.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • There's a website out there where they guy rants against kriging and it seems like he has some valid points. I think your final paragraph here is very illuminating. – Wayne Jun 19 '12 at 18:15
  • @Wayne Yes, you can tell what I'm reacting to. But although kriging has been used as "snake oil" by consultants, it has a lot going for it, including a theory of "change of support" to compare data obtained from (say) tiny samples of a medium to data obtained from much larger portions of that medium. Kriging ultimately is at the bottom of the most sophisticated spatio-temporal modeling today. It is also a useful way to evaluate alternative proposals: *e.g.,* many spatial interpolators are linear (or can be linearized), so it's fair to compare their estimation variance to that of kriging. – whuber Jun 19 '12 at 18:20
1

Kriging is simply least squares estimation for spatial data. As such it provides a linear unbiased estimator that minimizes the sum of squared errors. Since it is unbiased the MSE = the estimator variance and is a minimum.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • I didn't get the part calculating the kriging error.Also I am confused with kriging variance and variance. What is the difference and what is their significance – user31820 Jun 19 '12 at 16:54
  • @whuber. Thanks for the explanation but I didn't get the equation derivation when you calculated the MSE of the value predicted by the unbiased estimate and the true estimator. The second line to be specific in that equation – user31820 Jun 19 '12 at 21:50
  • @whuber Also I didn't get the wiki part when it calculates the kriging variance which is similar to the one in your answer. They have the same results but the initial terms are different. How come? – user31820 Jun 19 '12 at 22:41