1

I would like to have the same results using acf() function and cor() on a very easy ts. Unfortunately I am not able to. I thought that scalling factor should be $ (n-1)/n $, but it does not look like that. It look like the proper scaling factor is $ n-l/n $ where $ l $ is a lag. Can somebody help me and explain why it is like that?

tsExample <- c(1,2,1,2)
acf(tsExample, plot = FALSE)
cor(tsExample[1:4], tsExample[1:4])
cor(tsExample[2:4], tsExample[1:3])
cor(tsExample[3:4], tsExample[1:2])
cor(tsExample[4:4], tsExample[1:1])

cor(tsExample[1:4], tsExample[1:4])*(3/4)
cor(tsExample[2:4], tsExample[1:3])*(2/3)
cor(tsExample[3:4], tsExample[1:2])*(1/2)

cor(tsExample[1:4], tsExample[1:4])*(4/4)
cor(tsExample[2:4], tsExample[1:3])*(3/4)
cor(tsExample[3:4], tsExample[1:2])*(2/4)
koralgooll
  • 113
  • 3

1 Answers1

1

Because the autocorrelation in R is estimated using the wikipedia definition, which is to rephrase: $$\hat{R_k}=\frac{1}{N-k}\sum_{i=1}^{N-k}X'_tX'_{t+k}$$ where $X_t'$ refers to standardized form of $X_t$. This standardization is common, i.e. assumes a common mean and common deviation, $\mu,\sigma$.

Edit: @Nick_Cox 's comment is critical. Your array has the same mean/std for sub-arrays while using the cor method, e.g. Sample mean and std of X[1:3] and X[2:4] are the same. If it wasn't, this scaling wouldn't work.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • 1
    I don't think the key is that there are $N - k$ terms as that is the number available regardless. The key is whether leading and lagging values are standardised with respect to mean and SD of each set (leading, lagging) -- which being an analogue of Pearson correlation would imply -- or with respect to the same set of values. Even under weak stationarity sample means will fluctuate so that the same mean and SD are not guaranteed for leading $X_{k + 1}$ to $X_N$ and lagging $X_{1}$ to $X_{N-k}$. Often the autocorrelation recipe is chosen so that spectral density estimates are better behaved. – Nick Cox May 10 '19 at 12:27
  • 1
    Usually $N$ instead of $N-k$ is used in the denominator, otherwise the estimated autocorrelation function may not be positive semidefinite, see https://stats.stackexchange.com/a/294410/77222. – Jarle Tufto May 10 '19 at 19:15