As the simple correlation coefficient between the lagged series from the sample gives biased estimation of the population correlation coefficient $\rho_{ij} \left( t \right)$, an unbiased estimator should be applied.
If you take a look at the built in help (?ccf
), there is a reference there to the book Venables, W. N. and Ripley, B. D. (2002): Modern Applied Statistics with S. Fourth Edition. Springer-Verlag. On page 390 you can find the estimation formula for ccf
:
$$c_{ij}\left( t \right) = \frac{1}{n} \sum_{s = \max \left( 1, -t \right)}^{\min\left( n - t, n \right)}{\left( X_i \left( s + t \right) - \overline{X_i} \right) \left( X_j\left( s \right) - \overline{X_j} \right)}, \qquad r_{ij}\left( t \right) = \frac{c_{ij}\left( t \right)}{\left| c_{ij}\left( 0 \right) \right|}$$
(Actually $r_{ij} \left( t \right)$ is not there, but it can be easily deducted from acf
functions $r_t$. The latter is $r_t = \frac{c_t}{c_0}$ there, without the absolute value in the denominator, as $c_0$ is always positive in case of acf
, but it is obviously needed in case of ccf
(think about $r_{ij} \left( 0 \right) = -1$ as the case with a
and b
in this question).
As
a <- c(2, 1, 2, 1, 2, 1, 2)
b <- c(NA, NA, 1, 2, 1, 2, 1)
ccf(a, b, na.action=na.omit, plot=FALSE)
is equivalent with
a <- c(2, 1, 2, 1, 2)
b <- c(1, 2, 1, 2, 1)
ccf(a, b, plot=FALSE)
with the result
Autocorrelations of series ‘X’, by lag
-3 -2 -1 0 1 2 3
0.400 -0.567 0.800 -1.000 0.800 -0.567 0.400
you can check the calculations applying the above formulas 'manually' with the next R
code:
a <- c(2, 1, 2, 1, 2)
b <- c(1, 2, 1, 2, 1)
n <- length(a)
c_0 <- abs(1 / n * sum((a - mean(a)) * (b - mean(b))))
for (t in -3:3) {
if (t <= 0) {
c_t <- 1 / n * sum((a[1:(n + t)] - mean(a)) * (b[(1 - t):n] - mean(b)))
} else {
c_t <- 1 / n * sum((a[(1 + t):n] - mean(a)) * (b[1:(n - t)] - mean(b)))
}
r_t <- c_t / c_0
print(r_t)
}
with results
[1] 0.4
[1] -0.5666667
[1] 0.8
[1] -1
[1] 0.8
[1] -0.5666667
[1] 0.4