1

The Ljung-Box test statistic $Q$ is defined as $$ Q := n(n+2) \sum_{j=1}^{\ell} \frac{\hat{r}_{j}^2}{n-j}, $$ where $\ell$ is the number of lags and $$ \hat{r}_{j} := \frac{1}{\left\|a \right\|_{2}^2} \sum_{i=j}^{n-1} a_{i}a_{i-j} $$ (This is equation (1.1) of Ljung and Box's "On a Measure of Lack of Fit of Time Series Models"; it's available for free if you search through scholar.google.com, but I can't link to it because the url has a bizarre unique token!)

Ok, let's take $a = (1,2)$ and $\ell =1$. Then $\hat{r}_1 = 2/5$, $\hat{r}_1^2 = 4/25$, and so $Q = 32/25$. Trivial, correct? But I do this calculation in R:

> x <- c(1,2)
> b = Box.test(x, lag=1, "Ljung", fitdf=0)
> b$statistic
2

So R says my super basic calculation is wrong! Does R use a modified definition of the Ljung-Box statistic, or have I made a trivial mistake?

Note: Mathematica agrees with R:

data = {1,2};
H = AutocorrelationTest[data, 1, "HypothesisTestData"];
H["TestStatistic", "LjungBox"]
2
Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
user14717
  • 185
  • 6
  • 1
    I could for example the paper here: https://www.researchgate.net/publication/246995234_On_a_Measure_of_Lack_of_Fit_in_Time_Series_Models – Christoph Hanck Oct 11 '19 at 12:34

1 Answers1

4

Your computation for the autocorrelation coefficient does not demean the data:

x <- c(1,2)
b = Box.test(x, lag=1, "Ljung", fitdf=0)
b$statistic

# your \hat{r}_1, without demeaning
x[1]*x[2]/sum(x^2)

# R's \hat{r}_1
r1 <- acf(x, plot=F)$acf[2]
# i.e., an estimate of the first autocovariance divided by an estimate of the variance (with divisions by n that cancel out):
xbar <- mean(x) # 1.5
(x[1]-xbar)*(x[2]-xbar)/((x[1]-xbar)^2+(x[2]-xbar)^2)

n <- length(x)
(Q <- n*(n+2)*r1^2) # agrees with Box.text

See, e.g., here for further discussion of how R computes the acf.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
  • Awesome thanks! But quick question: Shouldn't we have a different statistic for different means? The purpose is to test the residuals, but if the residuals have a large mean, doesn't that indicate a bad model fit? – user14717 Oct 11 '19 at 12:16
  • I am not sure what you mean by "a different statistic for different means", but I would say that, while you are right that nonzero means of residuals may indicate problems, in most applications this is not a problem as including a constant in the original model will ensure that residuals have mean zero by construction. – Christoph Hanck Oct 11 '19 at 12:30
  • Got it, thanks! – user14717 Oct 11 '19 at 12:32