Real life examples of difference between independence and correlation

Question

It is well known that independence of random variables implies zero correlation but zero correlation need not imply independence.

I came across plenty of mathematical examples demonstrating dependence despite zero correlation. Are there any real life examples to support this fact?

Be careful, only zero correlation and **jointly** normal variables imply independence. — Francis, Mar 02 '16 at 09:06
Length and volume of the cube, they are not independent as volume = (length)^3. But as volume is not linear function of length they are not correlated. — Siddhesh, Mar 02 '16 at 09:11
@Siddesh "But as volume is not linear function of length they are not correlated." Well, not *perfectly* correlated. But they would be positively correlated. — Silverfish, Mar 02 '16 at 09:57
Of course, in actual data, you rarely find variables are completely uncorrelated. But I wonder if this could be answered in a heuristic kind of way, somewhat analogous to "correlation between ice cream sales and drowning" when discussing correlation vs causation (which isn't usually taken to refer to a real data set, more to make a conceptual point). — Silverfish, Mar 02 '16 at 10:02
@Siddhesh: that will work only if $E[\mathrm{length}^4]-E[\mathrm{length}]E[\mathrm{length}^3]=0$... — Francis, Mar 02 '16 at 11:05
The phrasing "real life examples that support this claim [that random variables can be dependent but uncorrelated]" is a little puzzling: it's more of an established fact than a claim. — Adrian, Mar 02 '16 at 11:12
I have removed the (incorrect) claim about normal variables having independence implying non-correlation. As Francis points out, this is really about *joint* normal variables... I'm sure we have a question about this somewhere, though I can't find it, but [Wikipedia has a whole article on the fallacy](https://en.wikipedia.org/wiki/Normally_distributed_and_uncorrelated_does_not_imply_independent). — Silverfish, Mar 02 '16 at 15:50
Feel free to put the comment about the normal distribution back in if you disagree with my edit. But I thought that it would be better removed as (1) it's a distracting side-issue to your main question, (2) it has (I think) already been asked on CV before so would be a duplicate of existing material here, (3) I didn't want it to cause confusion among future readers. I've tried to edit the question in such a way that would increase its chances of being reopened: I think this question is quite distinct from the "mathematical statistics" ones on the same topic. — Silverfish, Mar 02 '16 at 15:53
I still think this question is really nice, and might attract some further interesting answers if it could be reopened (which might involve some editing to clearly distinguish it from the thread it is currently deemed a duplicate of). I have raised a [thread on Meta](http://meta.stats.stackexchange.com/q/3005/22228) about what it would take for this question to be reopened. All comments welcome. — Silverfish, Mar 04 '16 at 21:10

Adrian · Answer 1 · 2016-03-26T12:18:14.120

Stock returns are a decent real-life example of what you're asking for. There's very close to zero correlation between today's and yesterday's S&P 500 return. However, there is clear dependence: squared returns are positively autocorrelated; periods of high volatility are clustered in time.

R code:

library(ggplot2)
library(grid)
library(quantmod)

symbols   <- new.env()
date_from <- as.Date("1960-01-01")
date_to   <- as.Date("2016-02-01")
getSymbols("^GSPC", env=symbols, src="yahoo", from=date_from, to=date_to)  # S&P500

df <- data.frame(close=as.numeric(symbols$GSPC$GSPC.Close),
                 date=index(symbols$GSPC))
df$log_return     <- c(NA, diff(log(df$close)))
df$log_return_lag <- c(NA, head(df$log_return, nrow(df) - 1))

cor(df$log_return,   df$log_return_lag,   use="pairwise.complete.obs")  # 0.02
cor(df$log_return^2, df$log_return_lag^2, use="pairwise.complete.obs")  # 0.14

acf(df$log_return,     na.action=na.pass)  # Basically zero autocorrelation
acf((df$log_return^2), na.action=na.pass)  # Squared returns positively autocorrelated

p <- (ggplot(df, aes(x=date, y=log_return)) +
      geom_point(alpha=0.5) +
      theme_bw() + theme(panel.border=element_blank()))
p
ggsave("log_returns_s&p.png", p, width=10, height=8)

The timeseries of log returns on the S&P 500:

If returns were independent through time (and stationary), it would be very unlikely to see those patterns of clustered volatility, and you wouldn't see autocorrelation in squared log returns.

score 3 · Answer 2 · answered Mar 02 '16 at 11:50

3

Another example is the relationship between stress and grades on an exam. The relationship is an inverse U shape and the correlation is very low even though causation seems pretty clear.

answered Mar 02 '16 at 11:50

Peter Flom

94,055
35
143
276

2

That's a neat example. Do you have data or this just based on introspection / teaching experience? – Adrian Mar 02 '16 at 11:51
1

I saw a study of this, but I saw it many years ago so I don't have the citation or the actual data. – Peter Flom Mar 02 '16 at 11:52

Real life examples of difference between independence and correlation

2 Answers2

Linked