5

Suppose that there are two two-dimensional maps. For simplification, let's say one map is the temperatures in the 48 continental U. S. states, and the other map is the corresponding humidities. The data of temperature and humidity are collected at one specific moment at some equally-spaced (e.g., 1000) locations. If I directly calculate the correlation between the two maps, the coefficient would be inflated because the maps are spatially correlated? Suppose that I know the full width at half maximum (FWHM) that characterizes the spatial correlation, is there a way that I can adjust the correlation coefficient or its degrees of freedom?

chl
  • 50,972
  • 18
  • 205
  • 364
bluepole
  • 2,376
  • 3
  • 25
  • 34
  • 3
    It would be nice to have a clearer idea of what you mean by "inflated." This suggests a comparison, so one is moved to ask, inflated *compared to what*? If you mean the correlation coefficient of the distribution of *all* temperatures and humidities at that moment, we should pause for a moment to consider what relationship the *sample* correlation would have to it and why--if at all--it could be expected to be "inflated," and how--if at all--that would be related to "spatial correlation." I believe that if you clarify these points, you might be able to answer this question yourself. – whuber Dec 05 '12 at 22:35
  • @whuber: Thanks a lot for the comment! The suspicion of inflation is not based on any empirical information (I don't have any), but comes from my intuition. In other words, my concern is that the data for temperature/humidity are not sampled independently, and that's why I thought the spatial correlation might provide a way to correct for any inflation, if existing. – bluepole Dec 06 '12 at 16:37
  • In what sense are they not sampled "independently"? Consider the non-spatial situation where two related variables are observed: each measurement produces an ordered pair of numbers. How does your situation differ from that? Here's another way to think about it: imagine that $10^{1000}$ pairs of (temp, humidity) values could be recorded, effectively making an exhaustive census of the two fields. Taking a simple random sample of these pairs obviously has no problems with spatial correlation. Your sample does not differ a whole lot from that situation. – whuber Dec 06 '12 at 18:35
  • Have you looked at Spacial Dynamic Factor Analysis? http://faculty.chicagobooth.edu/hedibert.lopes/scientific/pdf/talk-tokyo.pdf It appears to have been designed to solve exactly this problem. – Mimshot Dec 06 '12 at 18:53
  • @Mimshot: Thanks for the suggestion! Spacial Dynamic Factor Analysis seems to be an interesting idea. However, that approach would not work for me. The correlation between temperature and humidity is my case is calculated among all the geological locations at one time point. And there is no time dimension involved. – bluepole Dec 06 '12 at 19:25
  • @whuber: Suppose $x_i$ and $y_i$ are the temperature and at location $i$. The correlation calculation is essentially a simple regression model: $y_i$ = $\alpha$ + $\beta$ * $x_i$ + $\epsilon_i$. If I understand it correctly, the $t$-statistic for $\beta$ (or the correlation between the two variables) would be inflated if the residuals $\epsilon$ are serially correlated (e.g., AR(1)). In my case, the complication is that my data are on a two-dimensional space, not a 1D time series. Is my concern still illusional? – bluepole Dec 06 '12 at 19:36
  • Yes, it is an illusion, because although your temperatures $x$ and humidities $y$ likely are spatially correlated, the *residuals* possibly are not. Your regression setup is not ideal for analyzing this situation, since the $x_i$ are themselves random variables. Think instead of the dataset $((x_i,y_i), i=1,\ldots,n)$ as consisting of a sample of some bivariate distribution. Furthermore, although correlated residuals will affect aspects of regression, when you have sufficiently extensive data (compared to the range of correlation) the regression estimates are still accurate. – whuber Dec 06 '12 at 20:39
  • @whuber: I used the temperature/humidity example simply as an analogy. There is some concern about the spatial correlation in the residuals in the field with the data I'm dealing with. I understand that the estimates for the regression coefficients are unbiased, but the correlation structure in the residuals would inflate the statistics including correlation coefficients, right? So my question is, if the residuals are indeed correlated, is there any way to correct for it on a 2D space? – bluepole Dec 06 '12 at 21:45
  • The point of the last half of my previous comment is that your correlation coefficient estimates are unbiased, regardless. There is only the question of adjusting the DF if you want to compute the variance of those estimates. The adjustments require you to quantify the spatial correlation, so a mere analogy won't do you any good: you need to tell us about the data you actually are dealing with rather than some hypothetical situation. – whuber Dec 06 '12 at 21:57
  • 1
    @whuber: You're right that the correlation coefficient is unbiased, but it's the significance of the correlation that I'm interested. And yes I thought about the adjustment about the DF, but just couldn't figure out how to make a proper adjustment for the DF based on the spatial correlation such as FWHM. Sorry I didn't bring up the real data context, but the essential issue in terms of spatial correlation remains the same. I hope that the temperature/humidity analogy captures the problem I'm facing. – bluepole Dec 06 '12 at 22:17

1 Answers1

2

What does correlation between the maps mean? On the other hand, if you mean the correlation between the temperature and the humidity, we then have to ask what the population is that you are interested in and what the context of the question is. For example, the correlation between humidity and temperature in one location over time may differ from that in another location. The correlation between humidity and temperature over the entire geography of the US (assuming you can take a random sample from the US) during the summer may differ from that during the winter.

To even take a stab at answer those questions requires knowing why you are asking them.

Emil Friedman
  • 711
  • 4
  • 6