3

I have a large number (hundreds to thousands) of noisy time series that represent contemporaneous observations from different subjects.

I hypothesise that there exist lead-lag relationships between observations for different subjects (or groups of subjects.)

I would like to explore the potential use of such lead-lag relationships for the purposes of predicting the individual series.

What methods might I consider for this?

edit: To be clear, I am not looking at pairwise relationships. What I am looking for is a method that would look at the mountain of data at hand and attempt to discover (potentially non-linear) lead-lag relationships between arbitrary groups of series and the individual series to be predicted.

NPE
  • 5,351
  • 5
  • 33
  • 44
  • 1
    Read more about VAR models and Granger causality tests. They are used to estimate multivariate processes. – user158442 Apr 22 '17 at 23:58

3 Answers3

8

You can choose from about 40 years of research and countless books, dissertations, monographs etc.

Given that your question is not all that focussed yet, maybe an introductory time-series book could help. In a nutshell, the autocorrelation function gives clues to lead/lag relationships that may be present in a single time-series, or between two series.

Rob has done a lot of research into sensibly automating the process of identifying how many / which leads/lags to use, so please look at his forecast package for R and other research.

Dirk Eddelbuettel
  • 8,362
  • 2
  • 28
  • 43
0

http://en.wikipedia.org/wiki/Granger_causality

Barrett, Barnett & Seth have a paper which extends the idea of Granger causality to the multivariate case.

Vishal Belsare
  • 308
  • 1
  • 2
  • 9
0

You should consider the Cross Correlation Function as that is meant to identify the lead/lag relationship. Dirk had mentioned the Autocorrelation Function, but that is meant for just one single time series and not for multivariate. You should consider looking at the Box-Jenkins textbook Chapter 10 where they introduce the steps do this.

You say your data is noisy, but if it has a pattern where the lead/lag response is strong then you will find significance.

csgillespie
  • 11,849
  • 9
  • 56
  • 85
IrishStat
  • 27,906
  • 5
  • 29
  • 55