4

I just started with time series analysis and I would like to know whether there is a formular for calculating the autocorrelation function (ACF) and the partial autocorrelation function (PACF) for time series data. While there are forumlars for 'normal' data points and have not found any for time series. Maybe there is a special algorithm for doing that? I know that it is quite easy to calculate the ACF and PACF using e.g. R or Python. But how is this done? I'd appreciate every comment and will be quite thankful for your help.

PeterBe
  • 230
  • 3
  • 13

2 Answers2

4

Well if you mean how to estimate the ACF and PACF, here is how it's done:

1. ACF: In practice, a simple procedure is:

  1. Estimate the sample mean: $$\bar{y} = \frac{\sum_{t=1}^{T} y_t}{T}$$
  2. Calculate the sample autocorrelation: $$\hat{\rho_j} = \frac{\sum_{t=j+1}^{T}(y_t - \bar{y})(y_{t-j} - \bar{y})}{\sum_{t=1}^{T}(y_t - \bar{y})^2}$$
  3. Estimate the variance. In many softwares (including R if you use the acf() function), it is approximated by a the variance of a white noise: $T^{-1}$. This leads to confidence intervals that are asymptotically consistent, but the smaller than the actual confidence interval in many cases (leading to a larger probability of Type 1 Error), so interpret theese with caution!

2. PACF: The PACF is a bit more complicated, because it tries to nullify the effects of other order correlations.

It is estimated via a set of OLS regressions: $$y_{t,j} = \phi_{j,1} y_{t-1} + \phi_{j,2} y_{t-2} + ... + \phi_{j,j} y_{t-j} + \epsilon_t$$ And the coefficient you want is the $\phi_{j,j}$, estimated via OLS with the standard $\hat{\beta} = (X'X)^{-1}X'Y$ coefficients.

So, for example, if you would like the first order PACF: $$y_{t,1} = \phi_{1,1} y_{t-1} + \epsilon_t$$ and the coefficient you want is the $\hat{\phi_{1,1}}$ given by OLS: $\hat{\phi_{1,1}}=\frac{Cov(y_{t-1},y_t)}{Var(y_t)}$ (assuming weak stationarity).

The second order PACF would be the $\phi_{2,2}$ coefficient of: $$y_{t,2} = \phi_{2,1} y_{t-1} + \phi_{2,2} y_{t-2} + \epsilon_t$$

And so on.

Good references on this are Enders (2004) and Hamilton (1994).

Caio C.
  • 133
  • 8
  • Thanks Caio for your answer. I upvoted it. However, I have some questions. 1) How do you calculate the variance for the ACF (so basically the 3. step of your answer for the ACF). Could you elaborate a little bit more on that? 2) What is the variable y_t,1? What is the difference to y_t, y_t-1 and y_1? So the y has two arguments. – PeterBe Oct 27 '20 at 14:13
  • Hi Peter, the variance for the ACF will depend on which process the variable follows in population terms, and this is the reason many softwares use the white noise to estimate it (because the ACF and PACF are usually used for model identification). For example, for an MA(q), the variance could be approximated by $T^{-1} (1 + 2 \sum_{s=1}^{j-1}\rho^2_{s})$ for j>q, and it would therefore increase with the order j of the AC. Because this is difficult to compute, softwares use the linear variance of the WN. For you second question, the 1 in y_t,1 (...) – Caio C. Oct 27 '20 at 14:45
  • (...) would be just and indicator of the order of the PACF you're estimating. I noticed that I made a mistake in the example for the second order PACF, it should say y_t,2. I'll correct it now. Hope this helps! – Caio C. Oct 27 '20 at 14:46
  • 1
    Also, notice that the above mentioned problem with the variance of the ACF is not a problem per se due to consistency, but it's something to bear in mind: if an autocorrelation for, say, lag k is "just out" of the confidence interval, it could be that it would be inside such interval had the variance been correctly estimated. Notice that this is more the case as k increases in order. – Caio C. Oct 27 '20 at 14:50
  • Thanks Caio for your comments and effort. I really appreicate it. While I now understand the 2) question about the y_ts, I still have problems with the variance. Can I for example just use the sample variance equation https://en.wikipedia.org/wiki/Variance#Sample_variance ? I do not want to specify a process before (ARIMA etc). I just want to calculate the variance of the time series (so we have measured values) – PeterBe Oct 27 '20 at 15:05
  • Further I do not understand your last comment at all (but this is not as important as the formular for the variance) "Also, notice that the above mentioned problem with the variance of the ACF is not a problem per se due to consistency, but it's something to bear in mind: if an autocorrelation for, say, lag k is "just out" of the confidence interval, it could be that it would be inside such interval had the variance been correctly estimated. Notice that this is more the case as k increases in order." – PeterBe Oct 27 '20 at 15:06
  • Peter, the sample variance is how most softwares do it. Notice that the formula for $\hat{\rho_j}$ in my answer has this variance as its denominator ($T^{-1}$ is a common factor in the numerator and denominator, and therefore I have eliminated it). As for the second part, I have simulated an MA(2) here https://imgur.com/2bDpjQE with the WN and "correct" confidence intervals (WN in black and correct in red). Hope it helps. – Caio C. Oct 27 '20 at 15:21
  • Thanks Caio for your further answers and effort. Why do I have to simulate a MA(2)? I just want to calculate the variance. But if I understood your answer correctly, I can just use the sample variance, right? – PeterBe Oct 27 '20 at 15:44
  • Yes, that is correct. The MA(2) was just an example to illustrate my second point about the confidence intervals, but, as you said, it's not as important. Once you have identified the model, you can calculate the variance based on that. – Caio C. Oct 27 '20 at 15:46
  • Thanks for your great help Caio. I accepted your answer and I appreciate your effort. – PeterBe Oct 27 '20 at 15:50
  • Hi Caio, it's me again. I have a question to your answer. When calculating the ACF your third step is to estimate the variance. But where is the variance used in the formula? I do not see a symbol for the variance in your equation for the ACF – PeterBe Oct 28 '20 at 16:23
  • 1
    Hi Peter, what I was saying is that most softwares will use the variance of a white noise to estimate the variance of the ACF, which is $T^{-1}$. The real correct variance will depend on the type of model the data follows, and that's why they use the white noise. It's the reason you see a linear confidence interval when estimating the ACF in R (for e.g), when the real variance is not linear. – Caio C. Oct 28 '20 at 17:23
  • 1
    Thanks Caio for your answer. My question was targeting your formula for step 2 for calculating the ACF. I do not see a variance symbol there so how does the variance impact the values of the ACF? – PeterBe Oct 28 '20 at 17:25
  • 2
    Let me rewrite that: notice that what I have written is the same as: $\hat{\rho_j} = \frac{Cov(y_t,y_{t-j})}{Var(y_t)}$, when written in sample terms: $\hat{\rho_j} = \frac{\frac{\sum_{t=j+1}^{T}(y_t - \bar{y})(y_{t-j} - \bar{y})}{T}}{\frac{\sum_{t=1}^{T}(y_t - \bar{y})^2}{T}}$, and the denominator is the variance. – Caio C. Oct 28 '20 at 17:30
  • Thanks Caio for your answer and effort. Is the formula for the P^_j the sample autocorrelation function? – PeterBe Oct 28 '20 at 17:43
  • 1
    Yes, that is correct. – Caio C. Oct 28 '20 at 18:35
1

By definition, a ACF is $\gamma_j := E[(Y_{t}-\mu)(Y_{t-j}-\mu)]$ (for covariance) and the is $\rho_j := \frac{\gamma_j}{\gamma_0}$ (for correlation). For a closed formula wrote in function of parameters and such, you need to specify the model that you have (if you say what's your model, i can tell you how to get that formula). For example, the ACF for an MA(1) is:

enter image description here

As for the PACF, you want the correlation after having controlled for the other lags in the model, so you need to use OLS, so PACF is defined as the $\beta_j$ on $Y_t = \beta_0 + \beta_1Y_{t-1} + ... + \beta_jY_{t-j} + u_t$.

For using the data of your time series to calculate the amostral counterpart of those statistics (without having to set a model as what i presented untill now), first you need to assume that your series is at least weakly stationary and ergodic, which in loose terms is like saying that the series "will not change its statistical properties with time", so that the values of the series that you observe can be meaningful to the process behind it. Obs: This is a more formal thing about statistics, ideally you should try to learn what exactly those things mean and how to see when you can make that assumption and when you can't, but it's best to ask that in another post.

Then, you can get $\gamma_j$ and $\rho_j$ by the formula present in the most upvoted answer in ACF and PACF Formula. And for the PACF, there is a sistem of equations that connect the ACF correlations to it, known as the Levinson recursion (which also is explained in that answer).