76

I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)).

I've based this on reading many papers where the authors say 'to account for spatial autocorrelation, coordinates of points were included as smoothed terms' but these have never explained why this actually accounts for it. It's quite frustrating. I've read all the books I can find on GAMs in the hope of finding an answer, but most (e.g. Generalized Additive Models, an Introduction with R, S.N. Wood) just touch on the subject without explaining.

I'd really appreciate it if someone could explain WHY the inclusion of latitude and longitude accounts for spatial autocorrelation, and what 'accounting' for it really means - is it simply enough to include it in the model, or should you compare a model with s(x,y) in and a model without? And does the deviance explained by the term indicate the extent of spatial autocorrelation?

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
gisol
  • 943
  • 1
  • 8
  • 10
  • If it is relevant, I used the 'bam' function from the 'mgcv' package in R. – gisol Sep 01 '12 at 14:05
  • Also, I've tested for spatial autocorrelation using Moran's I. – gisol Sep 01 '12 at 14:19
  • possible duplicate of [Can you use a spline function of the spatial coordinates to control for spatial autocorrelation?](http://stats.stackexchange.com/questions/33812/can-you-use-a-spline-function-of-the-spatial-coordinates-to-control-for-spatial) – Macro Sep 01 '12 at 15:29
  • 4
    Given the answers here, we might flag the other Q @Macro links to as a duplicate of this one so people coming across that one see the Answers here, especially that of whuber. – Gavin Simpson Sep 01 '12 at 17:20
  • +1 @GavinSimpson - by the way, note that you do have the power to cast close votes, enough of which will lead to the two questions being merged. – Macro Sep 01 '12 at 20:11
  • @Macro Thanks; I voted on the other Q but on my own that would achieve little (other than possible duplicate link), hence the comment here. – Gavin Simpson Sep 02 '12 at 10:40
  • 1
    [Dealing with spatial and temporal autocorrelation](http://www.flutterbys.com.au/stats/tut/tut8.3a.html) is quite helpful for me. Some benefits of the model with correlation structure are explained in the beginning of "3.2 Accommodating spatial autocorrelation in the model" – ccshao Aug 01 '17 at 09:25

4 Answers4

62

"Spatial autocorrelation" means various things to various people. An overarching concept, though, is that a phenomenon observed at locations $\mathbf{z}$ may depend in some definite way on (a) covariates, (b) location, and (c) its values at nearby locations. (Where the technical definitions vary lie in the kind of data being considered, what "definite way" is postulated, and what "nearby" means: all of these have to be made quantitative in order to proceed.)

To see what might be going on, let's consider a simple example of such a spatial model to describe the topography of a region. Let the measured elevation at a point $\mathbf{z}$ be $y(\mathbf{z})$. One possible model is that $y$ depends in some definite mathematical way on the coordinates of $\mathbf{z}$, which I will write $(z_1,z_2)$ in this two-dimensional situation. Letting $\varepsilon$ represent (hypothetically independent) deviations between the observations and the model (which as usual are assumed to have zero expectation), we may write

$$y(\mathbf{z}) = \beta_0 + \beta_1 z_1 + \beta_2 z_2 + \varepsilon(\mathbf{z})$$

for a linear trend model. The linear trend (represented by the $\beta_1$ and $\beta_2$ coefficients) is one way to capture the idea that nearby values $y(\mathbf{z})$ and $y(\mathbf{z}')$, for $\mathbf{z}$ close to $\mathbf{z}'$, should tend to be close to one another. We can even calculate this by considering the expected value of the size of the difference between $y(\mathbf{z})$ and $y(\mathbf{z}')$, $E[|y(\mathbf{z}) - y(\mathbf{z}')|]$. It turns out the mathematics is much simpler if we use a slightly different measure of difference: instead, we compute the expected squared difference:

$$\eqalign{ E[\left(y(\mathbf{z}) - y(\mathbf{z}')\right)^2] &= E[\left(\beta_0 + \beta_1 z_1 + \beta_2 z_2 + \varepsilon(\mathbf{z}) - \left(\beta_0 + \beta_1 z_1' + \beta_2 z_2' + \varepsilon(\mathbf{z}')\right)\right)^2] \\ &=E[\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)' + \varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)^2] \\ &=E[\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)^2 \\ &\quad+ 2\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)\left(\varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)\\ &\quad+ \left(\varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)^2] \\ &=\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)^2 + E[\left(\varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)^2] }$$

This model is free of any explicit spatial autocorrelation, because there is no term in it directly relating $y(\mathbf{z})$ to nearby values $y(\mathbf{z}')$.

An alternative, different, model ignores the linear trend and supposes only that there is autocorrelation. One way to do that is through the structure of the deviations $\varepsilon(\mathbf{z})$. We might posit that

$$y(\mathbf{z}) = \beta_0 + \varepsilon(\mathbf{z})$$

and, to account for our anticipation of correlation, we will assume some kind of "covariance structure" for the $\varepsilon$. For this to be spatially meaningful, we will assume the covariance between $\varepsilon(\mathbf{z})$ and $\varepsilon(\mathbf{z}')$, equal to $E[\varepsilon(\mathbf{z})\varepsilon(\mathbf{z}')]$ because the $\varepsilon$ have zero means, tends to decrease as $\mathbf{z}$ and $\mathbf{z}'$ become more and more distant. Because the details do not matter, let's just call this covariance $C(\mathbf{z}, \mathbf{z}')$. This is spatial autocorrelation. Indeed, the (usual Pearson) correlation between $y(\mathbf{z})$ and $y(\mathbf{z}')$ is

$$\rho(y(\mathbf{z}), y(\mathbf{z}')) = \frac{C(\mathbf{z}, \mathbf{z}')}{\sqrt{C(\mathbf{z}, \mathbf{z})C(\mathbf{z}', \mathbf{z}')}}.$$

In this notation, the previous expected squared difference of $y$'s for the first model is

$$\eqalign{ E[\left(y(\mathbf{z}) - y(\mathbf{z}')\right)^2] &= \left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)^2 + E[\left(\varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)^2] \\ &=\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)^2 + C_1(\mathbf{z}, \mathbf{z}) + C_1(\mathbf{z}', \mathbf{z}') }$$

(assuming $\mathbf{z} \ne \mathbf{z}'$) because the $\varepsilon$ at different locations have been assumed to be independent. I have written $C_1$ instead of $C$ to indicate this is the covariance function for the first model.

When the covariances of the $\varepsilon$ do not vary dramatically from one location to another (indeed, they are usually assumed to be constant), this equation shows that the expected squared difference in $y$'s increases quadratically with the separation between $\mathbf{z}$ and $\mathbf{z}'$. The actual amount of increase is determined by the trend coefficients $\beta_0$ and $\beta_1$.

Let's see what the expected squared differences in the $y$'s is for the new model, model 2:

$$\eqalign{ E[\left(y(\mathbf{z}) - y(\mathbf{z}')\right)^2] &= E[\left(\beta_0 + \varepsilon(\mathbf{z}) - \left(\beta_0 + \varepsilon(\mathbf{z}')\right)\right)^2] \\ &=E[\left(\varepsilon(\mathbf{z}) - \varepsilon(\mathbf{z}')\right)^2] \\ &=E[\varepsilon(\mathbf{z})^2 - 2 \varepsilon(\mathbf{z})\varepsilon(\mathbf{z}') + \varepsilon(\mathbf{z}')^2] \\ &=C_2(\mathbf{z}, \mathbf{z}) - 2C_2(\mathbf{z}, \mathbf{z}') + C_2(\mathbf{z}', \mathbf{z}'). }$$

Again this behaves in the right way: because we figured $C_2(\mathbf{z}, \mathbf{z}')$ should decrease as $\mathbf{z}$ and $\mathbf{z}'$ become more separated, the expected squared difference in $y$'s indeed goes up with increasing separation of the locations.

Comparing the two expressions for $E[\left(y(\mathbf{z}) - y(\mathbf{z}')\right)^2]$ in the two models shows us that $\left(\beta_1 (z_1-z_1') + \beta_2 (z_2-z_2)'\right)^2$ in the first model is playing a role mathematically identical to $-2C_2(\mathbf{z}, \mathbf{z}')$ in the second model. (There's an additive constant lurking there, buried in the different meanings of the $C_i(\mathbf{z}, \mathbf{z})$, but it doesn't matter in this analysis.) Ergo, depending on the model, spatial correlation is typically represented as some combination of a trend and a stipulated correlation structure on random errors.

We now have, I hope, a clear answer to the question: one can represent the idea behind Tobler's Law of Geography ("everything is related to everything else, but nearer things are more related") in different ways. In some models, Tobler's Law is adequately represented by including trends (or "drift" terms) that are functions of spatial coordinates like longitude and latitude. In others, Tobler's Law is captured by means of a nontrivial covariance structure among additive random terms (the $\varepsilon$). In practice, models incorporate both methods. Which one you choose depends on what you want to accomplish with the model and on your view of how spatial autocorrelation arises--whether it is implied by underlying trends or reflects variations you wish to consider random. Neither one is always right and, in any given problem, it's often possible to use both kinds of models to analyze the data, understand the phenomenon, and predict its values at other locations (interpolation).

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 2
    +1 - it's nice to see the link between two approaches for handling spatial dependence. Great answer, whuber! – Macro Sep 01 '12 at 16:59
  • Very comprehensive, thank you. It will take me a few moments to think all this through. – gisol Sep 01 '12 at 17:04
  • 9
    If all statistical writing were of this ilk there would be a lot more clear-thinking applied statistical work in the world. Beautifully done. – Ari B. Friedman Mar 11 '13 at 12:21
  • Do I understand this answer correctly when I derive from it that simply adding X/Y-coordinates as independent variables to any(?!) model will account for spatial autocorrelation to some degree? – Funkwecker Apr 18 '18 at 13:50
  • 1
    @Julian: We're talking about constructing different models for the same data. If you include X and Y coordinates as explanatory variables but otherwise do not account for spatial correlation, then "spatial correlation" makes no sense for this model, so we must be careful about what we mean by "account for spatial correlation." But if we understand your question to ask whether including the coordinates as explanatory variables can be as effective as constructing a model in which spatial correlation is explicitly represented, then my answer is "yes, often that is the case." – whuber Apr 18 '18 at 14:03
53

The main issue in any statistical model is the assumptions that underlay any inference procedure. In the sort of model you describe, the residuals are assumed independent. If they have some spatial dependence and this is not modelled in the sytematic part of the model, the residuals from that model will also exhibit spatial dependence, or in other words they will be spatially autocorrelated. Such dependence would invalidate the theory that produces p-values from test statistics in the GAM for example; you can't trust the p-values because they were computed assuming independence.

You have two main options for handling such data; i) model the spatial dependence in the systematic part of the model, or ii) relax the assumption of independence and estimate the correlation between residuals.

i) is what is being attempted by including a smooth of the spatial locations in the model. ii) requires estimation of the correlation matrix of the residuals often during model fitting using a procedure like generalised least squares. How well either of these approaches deal with the spatial dependence will depend upon the nature & complexity of the spatial dependence and how easily it can be modelled.

In summary, if you can model the spatial dependence between observations then the residuals are more likely to be independent random variables and therefore not violate the assumptions of any inferential procedure.

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
  • Thanks for your clear answer Gavin. What makes spatial autocorrelation fundamentally different from any gradient not included in the model? Say your study area was on a sloping hill, and the species of interest preferred lower habitat to higher habitat. Failing to include the elevation in the model would leave a structure in the residuals, would it not? Is it simply that spatial autocorrelation is (or was) forgotten about or not considered? (P.S. perhaps this is a poor example as inclusion of lat,long would account for this effect too). – gisol Sep 01 '12 at 16:43
  • 5
    Yes. I suspect that in the examples you have looked at either the spatial component was of interest so was modelled explicitly via a smooth of lat/lon or the spatial component was a nuisance term but needed to be modelled to leave the residuals i.i.d. If the "spatial" component is better modelled via a different variable (e.g. elevation in you comment) then a smooth of that variable would be used instead of the spatial locations. – Gavin Simpson Sep 01 '12 at 17:03
  • 1
    Why smoothed? What is exactly meant by "smoothed"? – Funkwecker Mar 02 '18 at 15:12
  • 2
    @Julian Values of the response are smoothed with respect to the 2 spatial coordinates. Or put another way, the spatial *effect* is estimated as a smooth 2-d function. By smooth we mean has some wiggliness measured by the integrated squared second derivative of the spline. The wiggliness is chosen to balance the fit and the complexity of the model. If you want to know how the smooth functions (splines) are formed then it might be worth asking a specific question. – Gavin Simpson Mar 02 '18 at 15:58
2

The other answers are good I just wanted to add something about 'accounting for' spatial autocorrelation. Sometimes this claim is made more strongly along the lines of "accounting for spatial autocorrelation not explained by the covariates".

This can present a misleading picture of what the spatial smooth does. It is not like there is some orderly queue in the likelihood where the smooth patiently waits for the covariates to go first and then smooth will mop up the 'unexplained' parts. In reality they all get a chance to explain the data.

This paper with an aptly named title presents the issue really clearly, although it is from the point of view of a CAR model the principles apply to GAM smooths.

Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love

The 'solution' in the paper is to smooth the residuals instead of smoothing on space. That would have the effect of allowing your covariates to explain what they can. Of course, there are many applications in which this would not be a desireable solution.

ASeaton
  • 101
  • 6
-1

Spatial correlation is simply how the x and y coordinates relate to the magnitude of the resulting surface in the space. So the autocorrelation between the coordinates can be expressed in terms of a functional relationship between the neighboring points.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • 1
    Hi Michael, thanks for the response. I think I understand what you've said, but it seems to be a description of spatial autocorrelation rather than how the coordinates inclusion accounts for it - I may be missing your point though. For example, say I have 2 models, the first (A) with a single term - deforestation as a function of the distance to a capital city, and the second (B) with the distance to capital city term but also the lat and long term. Would you mind reiterating your answer in this context? Perhaps I could understand it better. – gisol Sep 01 '12 at 14:53
  • 1
    I think that if there is no interaction term in the model the spatial autocorrelation between neighboring points is 0. When you have an iteraction term, that term determines the value of the spatial autocorrelations. – Michael R. Chernick Sep 01 '12 at 14:59
  • So if in model B the deviance explained by the x,y term is large, does this mean there's a large amount of spatial autocorrelation, or is the spatial autocorrelation 'accounted for', and the explanatory effect comes from environmental effects that correlate geographically? – gisol Sep 01 '12 at 15:09
  • The terms in the model enter into the calculation of the spatial correlations. The model determines them. – Michael R. Chernick Sep 01 '12 at 15:20
  • 4
    @Michael, spatial autocorrelation means that the correlation between points depends on their spatial locations. I think this answer would be more useful if you could explain why using a smooth function estimate, with the spatial locations as inputs, accounts for this. On the surface, it seems that the smooth function approach models the _mean_ while spatial autocorrelation refers to the _covariance_ structure. I know there is a relationship between the covariance function of a smooth process and smooth function estimation but, without making that connection, this answer seems incomplete. – Macro Sep 01 '12 at 15:35
  • @Macro I don't see what is missing from the answer. I don't think the smoothness of the function has anything to do with it. It is just that a function that relates one point in the x-y plane with others determines the spatial correlations. – Michael R. Chernick Sep 01 '12 at 16:30
  • 1
    @Michael, surely you can see that making the lat/long coordinates affect the mean is different from modeling the correlations between two points in space ... The OP asked how to model _spatial autocorrelation_ and I think part of the argument - the part that explains exactly how fitting a smooth spatial surface (which is what a generalized additive model in the coordinates would do) models the spatial autocorrelation. There is a relationship between gams and covariance functions (I don't know enough to be more precise) but appealing to that relationship seems to be what is required here. – Macro Sep 01 '12 at 16:33
  • @Macro In the GAM, the `s(x,y)` is there to model the spatial dependence and hence, if successful, will leave the residuals i.i.d. – Gavin Simpson Sep 01 '12 at 16:39
  • @GavinSimpson, for the record, I'm pretty certain that what the OP is proposing is a reasonable thing to do but I've never heard a formal argument for why modeling the spatial trend _in the mean_ suffices to model spatial autocorrelation in the same way you can with a proper spatial random effects model. Simon Wood has mentioned on some `R` msg boards how this is equivalent to a random effects model, but no details are given. I think _some_ indication of the connection is necessary to answer the question "Why does including latitude and longitude in a GAM account for spatial autocorrelation?". – Macro Sep 01 '12 at 16:41
  • @Macro The function s is suppose to model the interaction between the x and y coordinates. It is not simply an average. So it tells how the point (x$_1$,y$_1$) affects the value of the response surface at the point (x$_2$,y$_2$). – Michael R. Chernick Sep 01 '12 at 16:51
  • @Macro If the spatial dependence is systematic in the sense that $E(y_i)$ varies with spatial location then it is perfectly reasonable to do as the OP suggests and model this dependence as a spatial trend surface. I admit I am not familiar with spatial random effects. There is a link between the penalised regression spline representation of `s(x,y)` in **mgcv** & a mixed effects model as the penalties on the smooth can be viewed as random effects. – Gavin Simpson Sep 01 '12 at 17:11
  • @GavinSimpson, I know your statement ***"here is a link between the penalised regression spline representation of s(x,y) in mgcv & a mixed effects model as the penalties on the smooth can be viewed as random effects."*** is correct but I'm having trouble finding a good reference for this. Do you know of one? – Macro Oct 23 '12 at 16:56
  • @GavinSimpson, nevermind, I found one :) In case you're curious: **B.A. Brumback and J.A. Rice (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). Journal of the American Statistical Association 93, 961-994.** – Macro Oct 23 '12 at 18:17
  • 1
    @Marco I'd take a look at Simon Wood's [book](http://www.amazon.co.uk/Generalized-Additive-Models-An-Introduction/dp/1584884746/ref=sr_1_1?ie=UTF8&qid=1351017637&sr=8-1) if you can as it has the details and cites the relevant literature on the smooths as random effects bit. – Gavin Simpson Oct 23 '12 at 18:41
  • Thanks for this suggestion @GavinSimpson, I actually came across this book recently. It has been very useful. – Macro Dec 01 '12 at 20:43