I have two time series(X1
and X2
) each having 900 records. I wanted to establish relationship between them and put it in equation. I did following things:
1) Checked for correlation and it came out as 0.80. As I wanted to build robust model, I read further and came to know that correlation is not the right way to find relation between time series as there might be the case of spurious regression.
2) Then I tested both time series for stationarity and I got below p-values.
Time series X1 X2
ADF test 0.28 0.07
KPSS test 0.01 0.01
That means I can safely conclude that there is no unit root and both series are stationary.
3) Then I checked for lag length using VARselect
and I got below results
VARselect(mydata, lag.max=8, type="const")
$selection
AIC(n) HQ(n) SC(n) FPE(n)
4 2 1 4
$criteria
1 2 3 4 5
AIC(n) 9.536526 9.518359 9.517730 9.514627 9.515910
HQ(n) 9.547980 9.537448 9.544454 9.548987 9.557905
SC(n) 9.566621 9.568517 9.587951 9.604911 9.626258
FPE(n) 13856.729981 13607.265067 13598.708504 13556.586100 13574.003525
6 7 8
AIC(n) 9.518960 9.522675 9.528970
HQ(n) 9.568591 9.579942 9.593872
SC(n) 9.649371 9.673149 9.699507
FPE(n) 13615.481638 13666.186272 13752.515062
I guess that means I should choose 1
as lag length since AIC(n)
is highest for 1
. Please correct me if I am wrong. (Data I have is daily for last 3 years.)
4) After performing Johansen's test,
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 4.868739e-02 8.650614e-03 -2.834784e-19
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 1 | 8.51 7.52 9.24 12.97
r = 0 | 48.86 13.75 15.67 20.20
Eigenvectors, normalised to first column:
(These are the cointegration relations)
X1.l2 X2.l2 constant
X1.l2 1.000000 1.0000000 1.0000000
X2.l2 -1.043634 0.2793248 -0.1931227
constant 35.516701 -917.4329825 -168.0889421
Weights W:
(This is the loading matrix)
X1.l2 X2.l2 constant
X1.d -0.03565412 -0.003991993 8.008639e-18
X2.d 0.02731113 -0.015601292 -1.841754e-18
I guess, it means with 90% confidence, we can say both series are cointegrated at levels.
5) Then I ran Granger test to find out interdependence
grangertest(mydata, order=4)
Granger causality test
Model 1: X2 ~ Lags(X2, 1:4) + Lags(X1, 1:4)
Model 2: X2 ~ Lags(X2, 1:4)
Res.Df Df F Pr(>F)
1 968
2 972 -4 1.273 0.2788
and I don't know how to interpret this result.
6) Then I ran VAR since both series are stationary and cointegrated at level.
> myvar <- VAR(mydata, p=3, type="const")
> summary(myvar)
VAR Estimation Results:
=========================
Endogenous variables: X1, X2
Deterministic variables: const
Sample size: 978
Log Likelihood: -7412.001
Roots of the characteristic polynomial:
0.9838 0.9403 0.3227 0.256 0.1759 0.1286
Call:
VAR(y = mydata, p = 3, type = "const")
Estimation results for equation X1:
=====================================
X1 = X1.l1 + X2.l1 + X1.l2 + X2.l2 + X1.l3 + X2.l3 + const
Estimate Std. Error t value Pr(>|t|)
X1.l1 0.96679 0.03226 29.973 < 2e-16 ***
X2.l1 0.12106 0.01672 7.241 9.07e-13 ***
X1.l2 0.04524 0.04475 1.011 0.3123
X2.l2 -0.05777 0.02337 -2.472 0.0136 *
X1.l3 -0.04889 0.03096 -1.579 0.1147
X2.l3 -0.03031 0.01731 -1.751 0.0803 .
const 2.63598 2.73796 0.963 0.3359
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.755 on 971 degrees of freedom
Multiple R-Squared: 0.9863, Adjusted R-squared: 0.9862
F-statistic: 1.166e+04 on 6 and 971 DF, p-value: < 2.2e-16
Estimation results for equation X2:
========================================
X2 = X1.l1 + X2.l1 + X1.l2 + X2.l2 + X1.l3 + X2.l3 + const
Estimate Std. Error t value Pr(>|t|)
X1.l1 0.02762 0.06228 0.443 0.65753
X2.l1 0.97667 0.03228 30.252 < 2e-16 ***
X1.l2 0.02181 0.08641 0.252 0.80079
X2.l2 0.04168 0.04513 0.924 0.35597
X1.l3 -0.03310 0.05979 -0.554 0.57997
X2.l3 -0.05587 0.03343 -1.671 0.09497 .
const 15.32998 5.28683 2.900 0.00382 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.98 on 971 degrees of freedom
Multiple R-Squared: 0.9564, Adjusted R-squared: 0.9561
F-statistic: 3549 on 6 and 971 DF, p-value: < 2.2e-16
Covariance matrix of residuals:
X1 X2
X1 60.15 13.46
X2 13.46 224.26
Correlation matrix of residuals:
X1 X2
X1 1.0000 0.1159
X2 0.1159 1.0000
Questions:
- Is my approach correct?
- How should I interpret Granger and VAR results?
- With this approach, how can I put
X1
andX2
in equation term? - Please let me know if I misinterpreted or missed anything.
Thanks for your time and reading.