Help understanding a kriging variation for bare earth extraction

Question

Problem:

The authors of a paper (http://www.isprs.org/proceedings/XXXIV/part3/papers/paper106.pdf) develop a bare earth extraction algorithm for LiDAR that is based on kriging. What I don't understand is: they add measurement accuracy to the diagonals of their covariance matrix. The step is succinctly outlined in A.1 (so not a big time investment for anyone kind enough to follow the link).

Here is essentially what they do:

\begin{equation} z = \mathbf{c}^T \mathbf{C}^{-1} \mathbf{z} \end{equation} where \begin{equation} \mathbf{c} = (C(PP_1),C(PP_2),...,C(PP_n))^T \end{equation} \begin{equation} \mathbf{z} = (z_1, z_2, ..., z_n)^T \end{equation} and \begin{equation} \mathbf{C} = \left( \begin{array}{cccc} C(0)+\color{red}{\sigma_0/p_1} & C(P_1P_2) & \dots & C(P_1P_n) \\ & C(0)+\color{red}{\sigma_0/p_2} & \dots & C(P_2P_n)\\ & & & \vdots \\ & & & C(0)+\color{red}{\sigma_0/p_n}\end{array} \right) \end{equation}

where

$z$ is residual elevation after detrending the points.
$C(P_iP_j)$ is the covariance model they got from fitting their empirical semivariogram (they use a Gaussian variogram model).
$P_iP_j$ is the distance between points $P_i$ and $P_j$.

I think this is all pretty standard, except for the $\color{red}{\sigma_0/p_i}$:

$\sigma_0$ is an "a priori" measurement variance, which is equal for all data points. I suppose it's based on the sensor if it's "known" ahead of time.
$p_i$ is a "goodness" factor that ranges from zero (not a good point) to one (a good point). It's not critical, but $p_i$ is a function of the residual of the de-trended height, so large positive residuals have $p=0$ because they are probably trees or buildings and we're looking for points on the ground.

Questions:

I would like to understand why they do this. Has anyone seen a kriging variation like this elsewhere in which something was added to the diagonals of the covaraince matrix? Or is this approach of adding to the diagonals of the covariance matrix common in another context? Even if someone just had search terms, that might help me out a lot.
I would like to understand how this works. How does this approach act to decrease the influence of bad points (i.e, $p_i \rightarrow 0$)? It looks to me like $\sigma_0/p_i$ would give $z_i$ a large coefficient for $p_i \rightarrow 0$, giving bad points more influence.

What I've tried so far:

Reading their other papers, but I don't see an explanation or justification of why they do this. Their method does work though.
Googling to try to find out generically what it means to add a term like this to the diagonals of a covariance matrix. I've also been looking around for other sources on kriging to see if I can find something similar.

Extra:

I have a pretty good quantitative background, but don't have the strongest background in stats (just fyi). I have take taken a geospatial stats course though, so I've had coursework on kriging. I suspect that $\sigma$ could really be written as a matrix, but for some reason the off-diagonals are zero. It would seem less hokie to add matrices than to add to specific terms in a matrix. I wonder if the off-diagonal terms are zero because the measurement error between different points are uncorrelated. Even in that case, I thought the measurement error was wrapped up in the covariance function $C(P_iP_j)$ and would be represented by the nugget term, but they use a Gaussian model with no nugget.

Thanks in advance.

Edit: I'm a little more than half way there. I understand how this works, but still not why it's justified.

What I've come to understand:

I get how the weights diminish the influence of large-residual points. I thought the weight would act in the wrong direction, but I failed to recognize that the covariance matrix gets inverted. Interestingly (to me, maybe obvious to others here), the inverse of the covariance matrix can be thought of as a coupling matrix, which determines the strength of interaction between data points -- almost like masses attached to springs (e.g., How to interpret an inverse covariance or precision matrix?).

Below is some toy data with interpolations with the weights and without the weights (i.e., with all the weights set to one). The interpolation without the weights follows more closely the large residual points (which would be trees, buildings, etc).

below are the no-weight covariance matrix and inverted covariance matrix. The points are in order (so the left most toy point corresponds to the left most column etc.), hence the diagonal patter. You just get the expected pattern of coupling to close points.

below are the weighted covariance and precision matrix. You can see huge matrix elements in the covariance matrix, which is where the weights were nearly zero ($\sigma_0 / p_i get large$). This basically results in a being whole punched in the precision matrix, which can be thought of as decoupling those points from the others.

That's how the weights decrease the influence of large-residual elevation points.

What I have yet to understand:

I still don't understand why the authors would have thought to do this, or why it's valid to add a term to the diagonals of the covariance matrix (although I trust it is valid). @user30490, you said this was just a nugget term. There's two things I really don't understand about this. First, I thought the nugget term was wrapped up in the covariance model, which would put it in every term of the covariance matrix -- not just the diagonals. Second, the covariance model was fit from the empirical semivariogram. So, I would think that adding a term wholesale would throw off that fit, e.g., the sill would get raised above the data.

So I have two answers for why they add a bit to the diagonal. Search for the term the nugget in spatial statistics and you will see that if you believe your spatial process has small process error than you can add a small noise term to the model and it will appear on the diagonal of the covariance matrix in your kriging equations. Secondly, having a nugget term (also sometimes called jitter) allows for more stable estimates of the covariance matrix since often times the covariance become numerically singular/unstable. — , Aug 29 '15 at 13:37
Also, when you have a term being added on the diagonal you also no longer get interpolation from your kriging predictions. — , Aug 29 '15 at 13:38
Thanks for the response. $\sigma$ seems to similar in function to a nugget term, but I'm having trouble seeing it as just that simple. If $\sigma$ is the nugget term, why is it not in off-diagonal elements too, i.e., wrapped up in $C(P_iP_j)$, which is fit from the empirical semivariogram? Also, I really want to know how the $p_i$ weight acts to control the influence of points in the kriging method? Finally, could you clarify what you mean by your second comment? I don't understand what you meant when you said, "you also no longer get interpolation." Thanks again for the help. — user55937, Aug 30 '15 at 16:26
I can be wrong, but I see that as a theoretically sound version of the old trick (often considered dirty) to add a value to the main diagonal of the cov matrix to avoid ill-conditioned (near-singular) matrices. Either this or going into more complicated solutions such as proposed by Mohammadi et al (2016): https://arxiv.org/pdf/1602.00853.pdf which I used in my project (https://github.com/PauloCarvalhoRJ/gammaray) to fill the "holes" in regular grids with kriging. — Paulo Carvalho, Aug 28 '18 at 18:56

Help understanding a kriging variation for bare earth extraction

0 Answers0