Can somebody explain autocorrelation function in a time series data? Applying acf to the data, what would be the application?
-
2possible duplicate of [What to read from the autocorrelation function of a time series?](http://stats.stackexchange.com/questions/18284/what-to-read-from-the-autocorrelation-function-of-a-time-series) – Andy Nov 21 '13 at 13:21
-
In the context of wide sense stationary time series it gives a measure of dependency of a time series to its lagged version. – Cagdas Ozgenc Nov 21 '13 at 13:34
-
2it is a measure of how much is the current value influenced by the previous values in a time series. – htrahdis Nov 21 '13 at 13:36
-
@htrahdis As in the standard regression setting, beware of conflating *correlation* with *influence* (or causation). – whuber Nov 21 '13 at 16:40
-
@Andy That thread indeed looks similar--thank you for locating it--but the accepted (and only) answer does not directly address this question: it focuses on a particular acf. As such it provides an illustration of how the acf can be interpreted, but it is unclear to me whether any of that material responds to the present request for an explanation of ACFs in general. – whuber Nov 21 '13 at 16:44
2 Answers
Unlike regular sampling data, time-series data are ordered. Therefore, there is extra information about your sample that you could take advantage of, if there are useful temporal patterns. The autocorrelation function is one of the tools used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. As an example, here are some possible acf function values for a series with discrete time periods:
The notation is ACF(n=number of time periods between points)=correlation between points separated by n time periods. Ill give examples for the first few values of n.
ACF(0)=1 (all data are perfectly correlated with themselves), ACF(1)=.9 (the correlation between a point and the next point is 0.9), ACF(2)=.4 (the correlation between a point and a point two time steps ahead is 0.4)...etc.
So, the ACF tells you how correlated points are with each other, based on how many time steps they are separated by. That is the gist of autocorrelation, it is how correlated past data points are to future data points, for different values of the time separation. Typically, you'd expect the autocorrelation function to fall towards 0 as points become more separated (i.e. n becomes large in the above notation) because its generally harder to forecast further into the future from a given set of data. This is not a rule, but is typical.
Now, to the second part...why do we care? The ACF and its sister function, the partial autocorrelation function (more on this in a bit), are used in the Box-Jenkins/ARIMA modeling approach to determine how past and future data points are related in a time series. The partial autocorrelation function (PACF) can be thought of as the correlation between two points that are separated by some number of periods n, BUT with the effect of the intervening correlations removed. This is important because lets say that in reality, each data point is only directly correlated with the NEXT data point, and none other. However, it will APPEAR as if the current point is correlated with points further into the future, but only due to a "chain reaction" type effect, i.e., T1 is directly correlated with T2 which is directly correlated with T3, so it LOOKs like T1 is directly correlated with T3. The PACF will remove the intervening correlation with T2 so you can better discern patterns. A nice intro to this is here.
The NIST Engineering Statistics handbook, online, also has a chapter on this and an example time series analysis using autocorrelation and partial autocorrelation. I won't reproduce it here, but go through it and you should have a much better understanding of autocorrelation.
let me give you another perspective.
plot the lagged values of a time series with the current values of the time series.
if the graph you see is linear, means there is a linear dependence between the current values of the time series versus the lagged values of the time series.
autocorrelation values are the most obvious way to measure the linearity of that dependence.

- 638
- 5
- 5