How to determine correlation between stationary and non-stationary time series

Question

I have three time series of economic data based on quarterly observations; A, B and C, and I would like to ascertain the correlation (or not) between A and C as well as the correlation between B and S. The 1st order differenced time series of B passes the ADF and KPSS tests for stationarity. Series A & C fail the stationarity tests even after trying detrending (based on linear regression and centred moving averages of order 4) as well as 1st and 2nd order differencing.

What is the best way to ascertain the extent of correlation between A & C and between B & C under these circumstances?

Below are the time series data (tab delimited):

A   13,603  15,062  (22,984)    14,704  14,285  15,585  (17,460)    21,145  20,926  28,117  (6,524) 31,190  25,610  34,311  (21,376)    14,140  3,416   26,526  (21,159)    30,874  51,579  50,426  (19,874)    52,980  30,338
B   (92,345)    19,415  13,045  104,693 214,196 (39,180)    (29,979)    112,499 5,914   60,787  92,253  124,716 23,638  362,566 (66,896)    209,127 103,986 (13,418)    389,962 (161,400)   177,945 (36,645)    148,722 189,477
C   819,019 716,641 767,830 1,177,339   1,254,122   1,254,122   985,382 716,641 614,264 806,222 819,019 844,613 1,018,655   1,108,235   1,261,801   1,474,234   1,412,807   1,678,988   1,638,037   1,228,528   1,023,773   1,279,717   1,279,717   1,023,773

What is the significance of the bracketed terms in your input series? — Simon Hayward, Aug 27 '13 at 13:53
@Simon Hayward: Yes; sorry B and S was a typo. That should be B and C. The bracketed terms are negative numbers. — Allan, Aug 27 '13 at 14:18
I'm finding one extra observation on A compared with B and C is that correct? — Simon Hayward, Aug 27 '13 at 14:19
How odd. I'm plotting the ACF and PACF of all three series in R and the output is suggesting that all three are approximately stationary! Even A which has a pronounced 4 unit seasonal pattern. — Simon Hayward, Aug 27 '13 at 14:31
Must have been an issue with the copying and pasting. Please ignore the first term (13,603) in A so as to make the no. of observations equal for all series. — Allan, Aug 27 '13 at 14:31
I hadn't actually tried this in R. I was simply using some "rudimentary" techniques in MS Excel. I will download R and have another go at it. Forgive my ignorance; statistics is not my area but I have a project I am currently working on that has piqued my interest in time series. Thanks for the pointer(s). — Allan, Aug 27 '13 at 14:41

score 2 · Answer 1 · answered Aug 28 '13 at 18:00

Before going forward with implementing any kind of estimation/inference procedure in order to detect cross-correlations, you should know that stationary and non-stationary data "do not mix", and it is usually misleading to try to find any kind of association (correlation, co-movement e.t.c.) between them (which nevertheless may exist).

To look at a concrete example:

Assume the following two AR(1) time series, $z_t$ stationary, $y_t$ containing a unit root and hence non-stationary: $$z_t = \gamma + \delta z_{t-1} + u_t,\qquad E(u_t)=0, \; Var(u_t) = \sigma^2_u, \; E(u_tu_s) = 0 \; t\neq s,\; \delta <1, z_0=0$$

$$y_t = \alpha + y_{t-1} + \varepsilon_t,\qquad E(\varepsilon_t)=0, \; Var(\varepsilon_t) = \sigma^2_\varepsilon, \; E(\varepsilon_t\varepsilon_s) = 0 \; t\neq s,\; y_0=0$$

Assume now that their white-noise disturbances are contemporaneously correlated, i.e. that $E(u_t\varepsilon_t) = v_{u\varepsilon} \neq 0$. Hence the series are not independent. What will the attempt to calculate their correlation give us?

By repeated substitution (and assuming that the $z_t$ series is long enough), the two series can be written: $$ z_t = \frac {\gamma}{1-\delta} + \sum_{j=0}^{t-1}\delta^ju_{t-j} \Rightarrow E(z_t) = \frac {\gamma}{1-\delta}, \; Var(z_t) = \frac {\sigma^2_u}{1-\delta^2}$$ $$ y_t = \alpha t + \sum_{j=0}^{t-1}\varepsilon_{t-j} \Rightarrow E(y_t) = \alpha t, \; Var(y_t) = \sigma^2_{\varepsilon}t$$

The contemporaneous correlation coefficient between the two is

$$ \rho(z_t,y_t) = \frac{Cov(z_t,y_t)}{\sigma_{z_t},\sigma_{y_t}} $$

We have $$ Cov(z_t,y_t) = E(z_ty_t) - E(z_t)E(y_t) $$ $$= E\Big[\Big(\frac {\gamma}{1-\delta} + \sum_{j=0}^{t-1}\delta^ju_{t-j}\Big)\Big(\alpha t + \sum_{j=0}^{t-1}\varepsilon_{t-j}\Big)\Big] - \frac {\gamma}{1-\delta}\alpha t $$ $$ = E\Big(\sum_{j=0}^{t-1}\delta^ju_{t-j}\sum_{j=0}^{t-1}\varepsilon_{t-j}\Big) = E\Big(\sum_{j=0}^{t-1}\delta^ju_{t-j}\varepsilon_{t-j}\Big) = \frac {v_{u\varepsilon}}{1-\delta}$$

Therefore

$$\rho(z_t,y_t) = \frac{\frac {v_{u\varepsilon}}{1-\delta}}{\Big(\frac {\sigma^2_u}{1-\delta^2}\sigma^2_{\varepsilon}t\Big)^{\frac12}} = \frac{\sqrt{1-\delta^2}}{1-\delta} \frac{v_{u\varepsilon}}{\sigma_u\sigma_{\varepsilon}}\frac{1}{\sqrt{t}}$$

We see that the theoretical correlation coefficient is monotonically decreasing in time, i.e. it is also non-stationary, and different at each and every point in your sample. So any attempt to estimate it from the available sample, will give some "average" correlation coefficient which, moreover, will only hold for the specific time period that the sample covers. And being the average of a decreasing non-linear function of time, it will be difficult to interpret meaningfully.

The point of all this algebra was: you have to read about stationary and non-stationary series, in order to see whether you can extract some meaningful conclusion from their statistical study together. Examine at least, whether they are co-integrated.

Simon Hayward · Answer 2 · 2013-08-27T14:58:03.310

Do you know much about ARIMA modelling? The (fixed) data is suggesting to model A as and ARIMA $(0,0,0) \times (1,0,1)_4$ process (i.e. with a seasonal autoregressive parameter and seasonal moving average parameter at lag 4).

The other two series are approximately stationary. So going by the Box-Jenkins methodology (yes I know there are problems with it), you should be able to fit the above ARIMA model to A and then regress the residuals against the other two series.

Sadly I'm only educated to MSc level, so maybe someone smarter than me might give fuller details.

EDIT: Here is my R code if it helps, since you sound like you're new to R:-

   tsData <- read.csv("C:\\simon.hayward\\Documents\\tsData.csv ", header=TRUE)

   par(mfrow=c(2,1))

   acf(tsData[,1])
   pacf(tsData[,1])

   acf(tsData[,2])
   pacf(tsData[,2])

   acf(tsData[,3])
   pacf(tsData[,3])

   par(mfrow=c(1,1))

   ccf(tsData[,1], tsData[,2])

   ccf(tsData[,2], tsData[,3])

   ccf(tsData[,1], tsData[,3])

   fit1 <- arima(tsData[,1], order=c(0,0,0), seasonal = list(order=c(1,0,1), period =4))

   fit1.resid = fit1$resid

   ccf(fit1.resid, tsData[,2])

   ccf(fit1.resid, tsData[,3])

although the arima fitting procedure is warning me about a convergence error.

How to determine correlation between stationary and non-stationary time series

2 Answers2

Linked