25

What is the easiest way / method to compute the correlation between two time series that are exactly the same size? I thought of multiplying $(x[t]-\mu_x)$ and $(y[t] - \mu_y)$, and adding up the multiplication. So if this single number was positive, can we say these two series are correlated? I can think of some examples however where a linearly another exponentially growing time series would have no relation to eachother, but the computation above would report they were correlated.

Any thoughts?

BBSysDyn
  • 1,002
  • 1
  • 9
  • 17
  • 4
    Have you ever heard of the cross correlation function - http://en.wikipedia.org/wiki/Cross-correlation#Time_series_analysis? – Macro May 24 '12 at 15:00
  • Your two time series are exactly the same size. See http://stats.stackexchange.com/questions/3463/computing-correlation-and-the-significance-of-said-correlation-between-a-pair as it similar, not quite identical to your question, with two series of same size and frequency, although they are non-stationary. – Ellie Kesselman May 24 '12 at 15:20

3 Answers3

11

Macro's point is correct the proper way to compare for relationships between time series is by the cross-correlation function (assuming stationarity). Having the same length is not essential. The cross correlation at lag 0 just computes a correlation like doing the Pearson correlation estimate pairing the data at the identical time points. If they do have the same length as you are assuming, you will have exact T pairs where T is the number of time points for each series. Lag 1 cross correlation matches time t from series 1 with time t+1 in series 2. Note that here even though the series are the same length you only have T-2 pair as one point in the first series has no match in the second and one other point in the second series will not have a match from the first. Given these two series you can estimate the cross-correlation at several lags . If any of the cross correlations is statistically significantly different from 0 it will indicate a correlation between the two series.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • Hi Michael, is it possible to quantify "significanly different" -- can I use 1 or 2 standard deviation away from zero as significant? – BBSysDyn May 24 '12 at 16:36
  • @user423805 I have changed it to read statistically significantly different from 0. Formally that means that you test the null hypothesis that the correlation is zero vs the alternative that it is not 0. Then compute the two-sided p-value for the test statistic. Generally statistical significance mean p-value <=0.05. Sometimes other values are used to define statistical significance (0.01 for example). Most time series software packages that include mutiple time series can do these tests for you. Our friend IrishStat can speak to this regarding Autobox. – Michael R. Chernick May 24 '12 at 17:29
  • are there cases in which cross correlation at lag zero and pearson differ? – Bakaburg Apr 09 '15 at 16:42
  • Can you help answer my most recent question? I have two series with several significant correlations at various lags and I’m unsure how to interpret. – user10136297 Dec 16 '21 at 03:08
4

You might want to look at a similar question and my answer Correlating volume timeseries which suggests that you can compute cross-correlations BUT testing them is a horse of a different color ( an equine of a different hue ) due to autoregressive or deterministic structure within either series.

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • if I understand correctly, in that answer you are saying crosscorrelation between timeseries is useless. – BBSysDyn May 24 '12 at 16:40
  • user423805 MAY be useless unless the data is suitably pre-filtered to obtain I.I.D. This speaks directly to the OP's real concerns about spurious conclusions like "storks bringing babies J. Neyman 1938 http://en.wikipedia.org/wiki/Talk%3ACorrelation_does_not_imply_causation and http://www.amstat.org/about/statisticiansinhistory/index.cfm?fuseaction=biosinfo&BioID=11" etc ( I can think of some examples however where a linearly another exponentially growing time series would have no relation to eachother, but the computation above would report they were correlated. ) – IrishStat May 24 '12 at 17:12
  • I think the point is that the series need to be stationary for crosscorrelations to make sense. If filtering is necessary it is to mske the series stationary (like differencing or seasonal differencing). But to call it useless is wrong. – Michael R. Chernick May 25 '12 at 03:05
  • @Michael I said MAY be useless. – IrishStat May 25 '12 at 11:05
  • 1
    @IrishStat It was a good comment and took me back to my training in the 1970s. At that time I was learning about time series/forecasting methods for my civilian work in the US Army. We were using exponential smoothing as a way to forecast based on historical data over subjective estimates that were being used at the supply depots. Someone made the great suggestion to me to look at the more general ARIMA models and the 1970 text by Box and Jenkins and so began my interest in time series that became part of my career. – Michael R. Chernick May 25 '12 at 13:26
  • I think it was in 1974 that I had the great fortune to take an introductory time seris course sponsored by the Institute for Professional Education that was held at Carnegie-Mellon University in Pittsburgh. The instructors were George Box, George Tiao and David Pack an we learned from the first (1970) edition of the Box-Jenkins text. The reason I mention this is because George Box showed us a paper he coauthored with Paul Newbold that criticized a paper by Coen, Gomme and Kendall (the late Sir Maurice Kendall). The discussion is not in the text because the paper didn't come out until 1971. – Michael R. Chernick May 25 '12 at 13:42
  • The point was that the paper used a detrended time series to predict a share commodity index series (detrended) based on a detrended car production series and the Financial Times commodity index. Plotting the cross correlation function between the detrended share commodity index and the detrended car production series showed an apparent large correlation at lags 5 and 6. So they fit a regression model using the lagged series to predict the share index. The model assumed that the error terms were uncorrelated. – Michael R. Chernick May 25 '12 at 13:51
  • The problem was that the three series even after detrending had significant autocorrelations. Box and Newbold showed that the variance of the crosscorrelation estimates was highly underestimated because these autocorrelations were ignored. Hence the idea that these series would be useful in predicting the share index was false. This made quite an impression on me about the subtleties of time series analysis that even a very famous and seasoned statistician known for his accomplishments and books in time series and mutlivariate analysis could go wrong. – Michael R. Chernick May 25 '12 at 13:56
  • So I hope this example reinforces the imprtance of IrishStat's comment. – Michael R. Chernick May 25 '12 at 13:57
-1

There is some interesting stuff here

https://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python

This was actually what I needed. Simple to implement and explain.

BBSysDyn
  • 1,002
  • 1
  • 9
  • 17
  • 3
    -1 From what I can gather these answers are only concerned with the standard Pearson product-moment correlation. When applied to two time series, the standard Pearson correlation gives nonsensical results! If you follow these suggestions, all you do is produce statistical artefacts. See e.g. http://www.math.mcgill.ca/dstephens/OldCourses/204-2007/Handouts/Yule1926.pdf – Momo Jul 12 '16 at 12:29