Creating variogram for a 10,000 set data

Question

I am trying to create a variogram for a data set with 10,000 points. However, if I try to actually calculate the distance of each point with the other then I will have 10,000*9999/2 pairs. Out of these pairs of distances I can round off the distance values to lets say 2 decimal places. Then I can find the distances having equal values. Then I can take the average of the variances having equal distances to find the variance for that particular distance. Then I can get the experimental variogram.

This process will definitely be very slow. Is there any efficient way ? I mean instead of creating the variogram from all the observation point, I can only take a subset of it surrounding the point where I want to interpolate the value. I can create a variogram out of this subset. Then I can further take k neighbors of the destination point and use this subset variogram to interpolate. Will this be more efficient and correct?

score 4 · Accepted Answer · answered Jun 06 '12 at 21:20

The geoR package will do this efficiently:

n <- 10^4 # Number of points
v <- list(coords=matrix(runif(2*n),ncol=2), data=rnorm(n)) # Random data
system.time(v.vario <- variog(v))                          # Compute a variogram object

Elapsed time on this machine: 5.21 seconds.

For more points, you can subsample the data. (A stratified procedure that obtains collections of close-by points is better than a simple random sample, because accurately characterizing the variogram near the origin is important.) It's better, though, to partition the study area into "tiles" or subregions and evaluate variograms within those subregions: this is a great way to assess the stationarity hypothesis.

Note that for extremely large data sets variog has the option (max.dist). This will exclude pairs beyond a particular distance from inclusion in the empirical variogram calculations. This is a fairly reasonable approach since points sufficiently far out should be uncorrelated. — Jonathan Lisic, Jun 07 '12 at 01:46

Creating variogram for a 10,000 set data

1 Answers1