Creating a variogram for geographical data

Question

I want to create a covariogram for a certain field R distributed on a geographical area. Which sort of packages are available to calculate covariogram. Also the distance metric is geographical distance rather than simple euclidean as in the case of R2 coordinate system.

I want to know how much it will affect the result if I use euclidean instead of geographic distance. I think the packages that are available will use euclidean rather than actual geographic distance when I pass them the values.

See [this answer](http://stats.stackexchange.com/a/12686/1036) for references for a few R resources. This is probably more suited for stackoverflow, although a few questions probably come close to answering it. See the search on SO `[r] variogram` ([link](http://stackoverflow.com/search?q=%5Br%5D+variogram)). — Andy W, Jun 26 '12 at 13:39
The statistical part of this question is the last: how would "geographical distance" change the variogram? To answer that, please tell us how extensive the field is: does it cover the entire globe, a hemisphere, or less than a hemisphere? — whuber, Jun 26 '12 at 14:43

score 6 · Answer 1 · edited Jun 11 '20 at 14:32

Almost all variogram estimators use Euclidean distances; that is, they apply the Pythagorean formula to the coordinates.

Issues

For study areas covering most of the earth, this is a challenging problem. One issue is that distortions in the metric arising from using projected coordinates will differentially affect distances at different locations. This results in a kind of horizontal "smearing" of the experimental variogram as distances (the x-coordinate in the variogram) are shifted higher or lower than they truly are. It is important to keep the range of relative metric distortions reasonably small, perhaps no greater than 10-20% across the whole area.

For places of limited north-south extent and well away from the poles, the variation in metric distortion from using latitudes and longitudes (as if they were Euclidean coordinates) is relatively small and harmless. However, the relative lengths of one degree of longitude (which are scaled by the cosine of the latitude) are large enough to introduce noticeable north-south, east-west anisotropy in the structure (except at points close to the Equator). This is a geometric anisotropy that would be applied to any other correlation structures in the data and therefore would usually introduce unwanted complications, unless the phenomenon under study is expected to have an anisotropy with this orientation (which might be the case for certain long-range weather patterns, for instance).

Solutions

There are two relatively simple solutions. One, which is in common use, is to use a suitable projected coordinate system. For study areas up to the size of a small country, a national coordinate system will have so little distortion that the Euclidean calculations are more than accurate enough. For instance, in Canada south of the Arctic Circle, use a Lambert Conformal Conic projection.

A work-around to cope with the inevitable distortions on a world-scale study is to tile the study area into overlapping areas, each of which can be projected with suitable accuracy. Perform the kriging or other estimation separately in each area, then mosaic the results. They will differ a little on the overlap sections, but any differences should be relatively minor. If not, you can use the kriging estimation variances to form inverse-variance weighted averages of the predicted values wherever they overlap. (This use of inverse variances for weighting is just a heuristic, because the overlapping predicted values typically are strongly correlated, being based on the same sets of neighbors.)

For larger study areas, extending over more than a hemisphere, you can convert the (longitude, latitude) coordinates $(\lambda,\phi)$ into 3D geocentric Cartesian coordinates $(x,y,z)$ = $R(\cos{\lambda}\cos{\phi},\sin{\lambda}\cos{\phi},\sin{\phi})$ and compute distances with them. This will not introduce any artificial anisotropy. In effect it changes the true variogram $\gamma(h)$, where $h$ is geodesic distance on a sphere of radius $R$, to $\gamma(2R\arcsin\frac{h}{2R})$. This is just a mild re-expression of the lag distance. Because $\frac{d}{d\theta}\sin{\theta}|_0=1$, this does not change the slope at the origin--which is crucial for correct modeling of the random field--and because the change is solely a re-expression of the distance $h$, neither the sill nor the nugget will be altered.

Complications and pitfalls

It is not difficult to do variographic modeling in (lon, lat) coordinates if you really want to, but remember that (a) if the variogram is to be used for estimating integrals over extensive areas (as in reserve estimation), the usual formulas will not apply and (b) if it will be used for interpolation and mapping--which can involve calculations of dozens of distances at millions of points--specialized kriging software would be needed and it's going to take a performance hit when computing all those spherical distances.

Also bear in mind that when variograms are used for such estimation and interpolation, as in kriging, usually only short distances are involved. Thus, the long-range distance distortion from using the $(x,y,z)$ solution is of little practical concern. However, somehow you have to persuade the kriging program to use all three coordinates. If your program will handle 3D coordinates and let you specify the prediction points (instead of assuming they lie on a rectangular grid), then you can generate a mesh of points on the sphere in $(x,y,z)$ coordinates, pass them as input to the kriging program, and then map the kriging output in any way you see fit--including with (lon, lat) coordinates.

Well my another region is USA. So what should I do in this case. I mean you gave me an example of Lambert Conformal Conic projection for canada. That's fine but what about USA — user31820, Jul 02 '12 at 15:53
The USA has *loads* of projections. You can get good advice about which one(s) to use and how to carry out those projections at http://gis.stackexchange.com/. — whuber, Jul 02 '12 at 16:42
You might be interested in the work of Sampson and Guttorp who developed 3D estimators of semi-variograms for IPCC work. http://www.nrcse.washington.edu/pdf/trs06_interface.pdf — AdamO, Jul 02 '12 at 17:02
Thanks @AdamO. There is a subtle issue here, though: the data in question are not 3D. They represent a random field on a 2D manifold. There are subtle theoretical distinctions and an important practical one that I neglected to discuss: the treatment of anisotropy is quite different in 2 and 3 dimensions. Moreover, the sphere's topology prevents us from defining a globally continuous field of directions, which raises significant problems in identifying and characterizing anisotropy. These problems are not automatically solved using the geocentric $(x,y,z)$ coordinates. — whuber, Jul 02 '12 at 17:11
I found this link where they have used the geoR package of R to calculate the variogram. It seems they have used the lat and lon for the coordinates. They haven't mentioned anything about the projection. Do you think I can directly use this package. Here is the link http://www.ats.ucla.edu/stat/r/faq/variogram.htm — user31820, Jul 02 '12 at 17:11
@user There is a brief discussion of your question in the Diggle & Ribeiro book (which is effectively the geoR manual): see p. 39. It concludes, "Our **geoR** software implementation only calculates planar Euclidean distances." — whuber, Jul 02 '12 at 17:14
so it means they have also used just euclidean distances? To conclude I should just settle for euclidean distance of geoR package or use the projection method you have suggested? — user31820, Jul 02 '12 at 17:24
By far your best approach is to use a suitable projection. Using (lon, lat) as if they were Euclidean is tantamount to using the [Equirectangular projection](http://en.wikipedia.org/wiki/Equirectangular_projection), which introduces terrible amounts of distortion. That's going to be a serious issue in Canada, which gets uncomfortably close to a pole. — whuber, Jul 02 '12 at 17:27
let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/3961/discussion-between-whuber-and-user31820) — whuber, Jul 02 '12 at 17:31

Creating a variogram for geographical data

1 Answers1

Issues

Solutions

Complications and pitfalls