I'm trying to understand the "mechanism" behind the calculation of ACF values in a time series. As a "prove-it-to-myself exercise" [NOTE: I updated the code in this link to reflect the information in the accepted answer], I am not focusing on the elegance of the code, and I use loops on purpose. The problem is that, although the values I get are close to those generated by acf()
, they are not equal, and at some points, not all that close. Is there any conceptual misunderstanding?
As credited on the online notes linked above, the data was generated following this online example.
set.seed(0) # To reproduce results
x = seq(pi, 10 * pi, 0.1) # Creating time-series as sin function.
y = 0.1 * x + sin(x) + rnorm(x) # Time-series sinusoidal points with noise.
y = ts(y, start = 1800) # Labeling time axis.
model = lm(y ~ I(1801:2083)) # Detrending (0.1 * x)
st.y = y - predict(model) # Final de-trended ts (st.y)
ACF = 0 # Empty vector to capture the auto-correlations.
ACF[1] = cor(st.y, st.y) # The first entry in the ACF is the cor with itself.
for(i in 1:24){ # Took 24 points to parallel the output of `acf()`
lag = st.y[-c(1:i)] # Introducing lags in the stationary ts.
clipped.y = st.y[1:length(lag)] # Compensating by reducing length of ts.
ACF[i + 1] = cor(clipped.y, lag) # Storing each correlation.
}
w = acf(st.y) # Extracting values of acf without plotting.
all.equal(ACF, as.vector(w$acf)) # Checking for equality in values.
# Pretty close to manual calc: Mean relative difference: 0.03611463"
To get a tangible sense of the relative outputs, this is what the autocorrelations look like calculated manually:
1.0000000 0.3195564 0.3345448 0.2877745 0.2783382 0.2949996 ...
... -0.1262182 -0.1795683 -0.1941921 -0.1352814 -0.2153883 -0.3423663
as opposed to the R acf()
function:
1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 ...
... -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342
As suggested, this is likely explained comparing to the code in the call within acf()
in the line:
acf <- .Call(C_acf, x, lag.max, type == "correlation")
How can this C_acf
function be accessed?
I am having similar issues with the PACF, which I presume are related. The code for the PACF values is here. [NOTE: In this case I suspect it is actually a rounding difference].