Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey

Question

I am used to seeing Ljung-Box test used quite frequently for testing autocorrelation in raw data or in model residuals. I had nearly forgotten that there is another test for autocorrelation, namely, Breusch-Godfrey test.

Question: what are the main differences and similarities of the Ljung-Box and the Breusch-Godfrey tests, and when should one be preferred over the other?

(References are welcome. Somehow I was not able to find any comparisons of the two tests although I looked in a few textbooks and searched for material online. I was able to find the descriptions of each test separately, but what I am interested in is the comparison of the two.)

Related: ["Breusch-Godfrey test on residuals from an MA(q) model"](https://stats.stackexchange.com/questions/500783). — Richard Hardy, Jan 09 '21 at 16:05

score 42 · Accepted Answer · edited Apr 13 '17 at 12:44

42

There are some strong voices in the Econometrics community against the validity of the Ljung-Box $Q$-statistic for testing for autocorrelation based on the residuals from an autoregressive model (i.e. with lagged dependent variables in the regressor matrix), see particularly Maddala (2001) "Introduction to Econometrics (3d edition), ch 6.7, and 13. 5 p 528. Maddala literally laments the widespread use of this test, and instead considers as appropriate the "Langrange Multiplier" test of Breusch and Godfrey.

Maddala's argument against the Ljung-Box test is the same as the one raised against another omnipresent autocorrelation test, the "Durbin-Watson" one: with lagged dependent variables in the regressor matrix, the test is biased in favor of maintaining the null hypothesis of "no-autocorrelation" (the Monte-Carlo results obtained in @javlacalle answer allude to this fact). Maddala also mentions the low power of the test, see for example Davies, N., & Newbold, P. (1979). Some power studies of a portmanteau test of time series model specification. Biometrika, 66(1), 153-155.

Hayashi(2000), ch. 2.10 "Testing For serial correlation", presents a unified theoretical analysis, and I believe, clarifies the matter. Hayashi starts from zero: For the Ljung-Box $Q$-statistic to be asymptotically distributed as a chi-square, it must be the case that the process $\{z_t\}$ (whatever $z$ represents), whose sample autocorrelations we feed into the statistic is, under the null hypothesis of no autocorrelation, a martingale-difference sequence, i.e. that it satisfies

$$E(z_t \mid z_{t-1}, z_{t-2},...) = 0$$

and also it exhibits "own" conditional homoskedasticity

$$E(z^2_t \mid z_{t-1}, z_{t-2},...) = \sigma^2 >0$$

Under these conditions the Ljung-Box $Q$-statistic (which is a corrected-for-finite-samples variant of the original Box-Pierce $Q$-statistic), has asymptotically a chi-squared distribution, and its use has asymptotic justification.

Assume now that we have specified an autoregressive model (that perhaps includes also independent regressors in addition to lagged dependent variables), say

$$y_t = \mathbf x_t'\beta + \phi(L)y_t + u_t$$

where $\phi(L)$ is a polynomial in the lag operator, and we want to test for serial correlation by using the residuals of the estimation. So here $z_t \equiv \hat u_t$.

Hayashi shows that in order for the Ljung-Box $Q$-statistic based on the sample autocorrelations of the residuals, to have an asymptotic chi-square distribution under the null hypothesis of no autocorrelation, it must be the case that all regressors are "strictly exogenous" to the error term in the following sense:

$$E(\mathbf x_t\cdot u_s) = 0 ,\;\; E(y_t\cdot u_s)=0 \;\;\forall t,s$$

The "for all $t,s$" is the crucial requirement here, the one that reflects strict exogeneity. And it does not hold when lagged dependent variables exist in the regressor matrix. This is easily seen: set $s= t-1$ and then

$$E[y_t u_{t-1}] = E[(\mathbf x_t'\beta + \phi(L)y_t + u_t)u_{t-1}] =$$

$$ E[\mathbf x_t'\beta \cdot u_{t-1}]+ E[\phi(L)y_t \cdot u_{t-1}]+E[u_t \cdot u_{t-1}] \neq 0 $$

even if the $X$'s are independent of the error term, and even if the error term has no-autocorrelation: the term $E[\phi(L)y_t \cdot u_{t-1}]$ is not zero.

But this proves that the Ljung-Box $Q$ statistic is not valid in an autoregressive model, because it cannot be said to have an asymptotic chi-square distribution under the null.

Assume now that a weaker condition than strict exogeneity is satisfied, namely that

$$E(u_t \mid \mathbf x_t, \mathbf x_{t-1},...,\phi(L)y_t, u_{t-1}, u_{t-2},...) = 0$$

The strength of this condition is "inbetween" strict exogeneity and orthogonality. Under the null of no autocorrelation of the error term, this condition is "automatically" satisfied by an autoregressive model, with respect to the lagged dependent variables (for the $X$'s it must be separately assumed of course).

Then, there exists another statistic based on the residual sample autocorrelations, (not the Ljung-Box one), that does have an asymptotic chi-square distribution under the null. This other statistic can be calculated, as a convenience, by using the "auxiliary regression" route: regress the residuals $\{\hat u_t\}$ on the full regressor matrix and on past residuals (up to the lag we have used in the specification), obtain the uncentered $R^2$ from this auxilliary regression and multiply it by the sample size.

This statistic is used in what we call the "Breusch-Godfrey test for serial correlation".

It appears then that, when the regressors include lagged dependent variables (and so in all cases of autoregressive models also), the Ljung-Box test should be abandoned in favor of the Breusch-Godfrey LM test., not because "it performs worse", but because it does not possess asymptotic justification. Quite an impressive result, especially judging from the ubiquitous presence and application of the former.

UPDATE: Responding to doubts raised in the comments as to whether all the above apply also to "pure" time series models or not (i.e. without "$x$"-regressors), I have posted a detailed examination for the AR(1) model, in https://stats.stackexchange.com/a/205262/28746 .

edited Apr 13 '17 at 12:44

Community

1

answered Apr 25 '15 at 19:07

Alecos Papadopoulos

52,923
5
131
241

Very impressive, Alecos! Great explanation! Thank you so much! (I hope many more people will read your answer eventually and will benefit from it in their work or studies.) – Richard Hardy Apr 26 '15 at 08:11
+1 Very interesting. My initial guess was that in an AR model the distribution of the BG test could get distorted, but as you explained and the simulation exercise suggested, it is the LB test the one that gets more seriously affected. – javlacalle Apr 26 '15 at 10:07
The problem with your answer is that it's based on the assumption that we're dealing with ARMAX like model, i.e. with regressors $x_t$. not pure time series such as AR. – Aksakal Apr 02 '16 at 21:47
@Aksakal No. The $x$'s can be taken out of the picture -nothing changes in the above analysis. Just imagine $x=0$ everywhere in the above relations. The problem lies in the relation between $y$ and $u$, irrespective of any $x$'s being present or not. – Alecos Papadopoulos Apr 02 '16 at 23:26
@AlecosPapadopoulos, all the references that you gave limit their conclusions to models with regressors, they don't bring this issue with pure time series – Aksakal Apr 03 '16 at 00:53
Here's the excerpt from [Hayashi](https://books.google.com/books?id=QyIW8WUIyzcC&pg=PA144&lpg=PA144&dq=Hayashi+2.10+%22Testing+For+serial+correlation%22,&source=bl&ots=SDua-xr5I1&sig=DC7ZT0iABK8v2xMTx9t6fjEdlaQ&hl=en&sa=X&ved=0ahUKEwjPjdj_ofHLAhXD7yYKHRh5D4IQ6AEINTAE#v=onepage&q=Hayashi%202.10%20%22Testing%20For%20serial%20correlation%22%2C&f=false), the same Ch.2.10 to which you refer to: "Is it all right to use $\hat\rho_j$ for testing for serial correlation? The answer is yes, but only if the regressors are strictly exogenous". Obviously, pure ARIMA will not have an issue – Aksakal Apr 03 '16 at 01:12
@Aksakal It appears that for some reason, when you read the word "regressor" you take it to exclude the case where the explanatory variables are lags of the dependent variable. Why? The lags of the dependent variable are also regressors. – Alecos Papadopoulos Apr 03 '16 at 01:24
@AlecosPapadopoulos, nope. The regressors are $x$ as it's very clear from the equations you show. The lags are shows with the lag operator $L$. – Aksakal Apr 03 '16 at 01:30
@Aksakal The model that Hayashi discusses in this chapter postulates "regressors" in general, and they can be either other random variables, or lags of the dependent variable. In fact it explicitly develops it in such a way so as to accommodate time-series asymptotics. Look for example in p. 111, the "predetermined vs strictly exogenous regressors". Look at page 145-146 and go through all the derivations, while replacing $x$ by, say, the first lag of the dependent variable. – Alecos Papadopoulos Apr 03 '16 at 01:50
@Aksakal, I think this excerpt from Alecos post should make his argument clearer: $$ E[\mathbf x_t'\beta \cdot u_{t-1}]+ E[\phi(L)y_t \cdot u_{t-1}]+E[u_t \cdot u_{t-1}] \neq 0 $$ even if the $X$'s are independent of the error term, and _even if the error term has no-autocorrelation_: the term $E[\phi(L)y_t \cdot u_{t-1}]$ is not zero. [End of excerpt] So regardless of presence of $X$, the problem is in the lagged $y$s. Therefore, Ljung-Box test is inappropriate for residuals of pure AR or pure ARMA models. – Richard Hardy Apr 03 '16 at 08:14
@AlecosPapadopoulos, if you are confident in your argumentation (if you ask me, I am), then please consider the following. I recently posted a related answer in [this thread](http://stats.stackexchange.com/questions/6455/how-many-lags-to-use-in-the-ljung-box-test-of-a-time-series/205079?noredirect=1#comment389411_205079), and Aksakal did as well. Since these answers are conflicting, it makes sense to vote (or comment under them) to make clear which answer the community thinks is right. – Richard Hardy Apr 03 '16 at 08:33
@RichardHardy I see that the matter has spread to other threads. I will see if I can contribute an answer to the other thread you link to, since user Aksakal has laid some more detailed arguments there. – Alecos Papadopoulos Apr 03 '16 at 14:47
@Aksakal Following the suggestion by user RichardHardy, I have posted here, http://stats.stackexchange.com/a/205262/28746, a detailed examination of the AR(1) model. – Alecos Papadopoulos Apr 03 '16 at 16:27
@RichardHardy Following your suggestion, I have posted here, stats.stackexchange.com/a/205262/28746, a detailed examination of the AR(1) model. – Alecos Papadopoulos Apr 03 '16 at 16:29
@AlecosPapadopoulos, here's what Greene, [Econometrics](http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm), 7th ed. says on p.923: "The Durbin–Watson test is not likely to be valid when there is a lagged dependent variable in the equation.13 The statistic will usually be biased toward a finding of no autocorrelation. Three alternatives have been devised. The LM and Q tests can be used whether or not the regression contains a lagged dependent variable. (In the absence of a lagged dependent variable, they are asymptotically equivalent.)" He means Box-Pierce (Ljung) by Q test. – Aksakal Apr 03 '16 at 17:00
@RichardHardy, I do not argue with the fact that B-G test is *better* in some way than B-L. However, I do not think it's right to say that B-G is absolutely wrong and can't be used. This is a good thread, since it discusses the assumptions of the tests, such as strict exogeneity vs weak. Both tests are very old and there oether tests out there, which may work better for some problems, e.g. [Cumby, Huizinga (1990)](http://www.nber.org/papers/t0092) – Aksakal Apr 03 '16 at 17:22
@RichardHardy, typo in my previous comment. "B-G is absolutely wrong" should have been "L-B is absolutely wrong". – Aksakal Apr 03 '16 at 17:29
@Aksakal, while Alecos provides a clear algebraic proof and simultaneously a basis for an exact, math-based discussion, Greene does not include a detailed algebraic argument... What could be done? Could you prove Alecos wrong and Greene right, or should we invite Greene to have his say in a separate post? (Realistically, what chances do we have to engage him?) After all, this is an exact subject and we should not just have opinions about it -- we should rather reveal the hard facts. – Richard Hardy Apr 03 '16 at 19:39
1

@Aksakal, Also, part of the problem might be that the focus is jumping a bit here and there. We should separate the issues of (1) which of the tests is better from (2) which test works under which assumptions, and importantly, (3) which test works for which model (due to different model assumptions). The latter is perhaps the most useful question for practitioners. For example, I would not use L-B for residuals of an ARMA model because of what Alecos has shown. Do you argue that L-B can still be used for residuals of ARMA models (which is now also the central question in the other thread)? – Richard Hardy Apr 03 '16 at 20:04
This answer makes me existentially happy to still be learning stuff. – Alexis Jan 05 '19 at 17:58
1

@Alexis And that is a comment almost too flattering to be true. Thank you. – Alecos Papadopoulos Jan 05 '19 at 18:50
I have two related questions, perhaps you could take a look? ["Breusch-Godfrey test on residuals from an MA(q) model"](https://stats.stackexchange.com/questions/500783) and ["Is Ljung-Box test applicable on residuals from MA(q) models?"](https://stats.stackexchange.com/questions/500812) – Richard Hardy Dec 14 '20 at 12:35
@AlecosPapadopoulos, I have put up a bounty for a question on the practical implementation of the BG test, ["Breusch-Godfrey test on residuals from an MA(q) model"](https://stats.stackexchange.com/questions/500783). I presume you would be capable of answering it. I would gladly award the bounty for a decent answer. – Richard Hardy Jan 09 '21 at 16:06

javlacalle · Answer 2 · 2015-04-24T16:49:07.930

Conjecture

I don't know about any study comparing these tests. I had the suspicion that the Ljung-Box test is more appropriate in the context of time series models like ARIMA models, where the explanatory variables are lags of the dependent variables. The Breusch-Godfrey test could be more appropriate for a general regression model where the classical assumptions are met (in particular exogenous regressors).

My conjecture is that the distribution of the Breusch-Godfrey test (which relies on the residuals from a regression fitted by Ordinary Least Squares), may be affected by the fact that explanatory variables are not exogenous.

I did a small simulation exercise to check this and the results suggest the opposite: the Breusch-Godfrey test performs better than the Ljung-Box test when testing for autocorrelation in the residuals of an autoregressive model. Details and R code to reproduce or modify the exercise are given below.

Small simulation exercise

A typical application of the Ljung-Box test is to test for serial correlation in the residuals from a fitted ARIMA model. Here, I generate data from an AR(3) model and fit an AR(3) model.

The residuals satisfy the null hypothesis of no autocorrelation, therefore, we would expect uniformly distributed p-values. The null hypothesis should be rejected in a percentage of cases close to a chosen significance level, e.g. 5%.

Ljung-Box test:

## Ljung-Box test
n <- 200 # number of observations
niter <- 5000 # number of iterations
LB.pvals <- matrix(nrow=niter, ncol=4)
set.seed(123)
for (i in seq_len(niter))
{
  # Generate data from an AR(3) model and store the residuals
  x <- arima.sim(n, model=list(ar=c(0.6, -0.5, 0.4)))
  resid <- residuals(arima(x, order=c(3,0,0)))
  # Store p-value of the Ljung-Box for different lag orders
  LB.pvals[i,1] <- Box.test(resid, lag=1, type="Ljung-Box")$p.value
  LB.pvals[i,2] <- Box.test(resid, lag=2, type="Ljung-Box")$p.value
  LB.pvals[i,3] <- Box.test(resid, lag=3, type="Ljung-Box")$p.value
  LB.pvals[i,4] <- Box.test(resid, lag=4, type="Ljung-Box", fitdf=3)$p.value
}
sum(LB.pvals[,1] < 0.05)/niter
# [1] 0
sum(LB.pvals[,2] < 0.05)/niter
# [1] 0
sum(LB.pvals[,3] < 0.05)/niter
# [1] 0
sum(LB.pvals[,4] < 0.05)/niter
# [1] 0.0644
par(mfrow=c(2,2))
hist(LB.pvals[,1]); hist(LB.pvals[,2]); hist(LB.pvals[,3]); hist(LB.pvals[,4])

Ljung-Box test p-values

The results show that the null hypothesis is rejected in very rare cases. For a 5% level, the rate of rejections is much lower than 5%. The distribution of the p-values show a bias towards non-rejection of the null.

Edit In principle fitdf=3 should be set in all cases. This will account for the degrees of freedom that are lost after fitting the AR(3) model to get the residuals. However, for lags of order lower than 4, this will lead to negative or zero degrees of freedom, rendering the test inapplicable. According to the documentation ?stats::Box.test: These tests are sometimes applied to the residuals from an ARMA(p, q) fit, in which case the references suggest a better approximation to the null-hypothesis distribution is obtained by setting fitdf = p+q, provided of course that lag > fitdf.

Breusch-Godfrey test:

## Breusch-Godfrey test
require("lmtest")
n <- 200 # number of observations
niter <- 5000 # number of iterations
BG.pvals <- matrix(nrow=niter, ncol=4)
set.seed(123)
for (i in seq_len(niter))
{
  # Generate data from an AR(3) model and store the residuals
  x <- arima.sim(n, model=list(ar=c(0.6, -0.5, 0.4)))
  # create explanatory variables, lags of the dependent variable
  Mlags <- cbind(
    filter(x, c(0,1), method= "conv", sides=1),
    filter(x, c(0,0,1), method= "conv", sides=1),
    filter(x, c(0,0,0,1), method= "conv", sides=1))
  colnames(Mlags) <- paste("lag", seq_len(ncol(Mlags)))
  # store p-value of the Breusch-Godfrey test
  BG.pvals[i,1] <- bgtest(x ~ 1+Mlags, order=1, type="F", fill=NA)$p.value
  BG.pvals[i,2] <- bgtest(x ~ 1+Mlags, order=2, type="F", fill=NA)$p.value
  BG.pvals[i,3] <- bgtest(x ~ 1+Mlags, order=3, type="F", fill=NA)$p.value
  BG.pvals[i,4] <- bgtest(x ~ 1+Mlags, order=4, type="F", fill=NA)$p.value
}
sum(BG.pvals[,1] < 0.05)/niter
# [1] 0.0476
sum(BG.pvals[,2] < 0.05)/niter
# [1] 0.0438
sum(BG.pvals[,3] < 0.05)/niter
# [1] 0.047
sum(BG.pvals[,4] < 0.05)/niter
# [1] 0.0468
par(mfrow=c(2,2))
hist(BG.pvals[,1]); hist(BG.pvals[,2]); hist(BG.pvals[,3]); hist(BG.pvals[,4])

Breusch-Godfrey test p-values

The results for the Breusch-Godfrey test look more sensible. The p-values are uniformly distributed and rejection rates are closer to the significance level (as expected under the null hypothesis).

Great job (as always)! What about `LB.pvals[i,j]` for $j \in \{1,2,3\}$: does Ljung-Box testing make sense for $j \leqslant 3$ given that an AR(3) model with 3 coefficients was fit (`fitdf=3`)? If it doesn't, then the poor results of the Ljung-Box test for $j \in \{1,2,3\}$ are not surprising. — Richard Hardy, Apr 24 '15 at 10:43
Also, regarding what you say in the first paragraph: could you perhaps expand on that a little bit? I perceive the statements there as quite important, but the details are lacking. I may be asking for too much -- to "digest" things for me -- but if it would not be too difficult for you, I would appreciate that. — Richard Hardy, Apr 24 '15 at 10:49
@RichardHardy I edited the answer with some comments about `fitdf`. — javlacalle, Apr 24 '15 at 16:49
@RichardHardy As regards your second comment, my initial guess was that, since the computation of the LB test statistic does not required fitting a model, this test could be less sensitive to the BG test when some of the requirements of the classical regression model are not met, in particular when the explanatory variables are not independent of the error term. I just thought that, in the case of an AR model, the regression of the BG could be estimated poorly. According to these results, that's not the case. — javlacalle, Apr 24 '15 at 16:50
Maybe the LB test does not make case for lags lower than the order of the AR model. I think that the usage of this test is a common practice in this context and apparently not much relevance is given to the degrees of freedom that are lost. `stats::tsdiag.Arima` sets `fitdf=0` for any lag order. — javlacalle, Apr 24 '15 at 16:52
For lag orders lower than $p$, setting `fitdf=0` is a necessity, otherwise we cannot use the test. For higher lag orders the test performs better. In the example, the test for lag=5 performs as expected (uniform distribution of p-values), however, for lag=4 results are not compelling even after accounting for the 3 degrees that are lost. — javlacalle, Apr 24 '15 at 16:52
Usually, we are interested in testing for autocorrelation of low order in the residuals from an ARMA model. If the results of this small exercise were shown to be general for any ARMA model, the conclusion would be that the Ljung-Box test should not be used to this end. I have never found this recommendation, on the contrary I think that the Ljung-Box test is a common practice in the context of the example that I gave. The results shown in this answer call for caution. — javlacalle, Apr 24 '15 at 16:53
My gut feeling is that this problem has to do with the following: a sum of $n$ linearly independent $\chi^2 (1)$ random variables is distributed as $\chi^2 (n)$. A sum of $n$ linearly dependent $\chi^2 (1)$ random variables with $k$ linear restrictions is distributed as $\chi^2 (n-k)$. When $k \geqslant n$ this is ill-defined. I suspect something like this happens when the Ljung-Box test is used on model residuals from an AR($k$) model. — Richard Hardy, Apr 24 '15 at 18:51
The residuals are not independent but linearly restricted; first, they sum to zero; second, their autocorrelations are zero for the first $k$ lags. What I just wrote may not be exactly true, but the idea is there. Also, I have been aware that Ljung-Box test should not be applied for `lag — Richard Hardy, Apr 24 '15 at 18:55
In short, when you say *for lags of order lower than 4, this will lead to negative or zero degrees of freedom, rendering the test inapplicable*, I think you should make a different conclusion: not use the test for those lags. If you proceed by setting `fitdf=0` in place of `fitdf=3` you might be cheating yourself. — Richard Hardy, Apr 24 '15 at 19:00
Frankly, I didn't know that the LB test shouldn't be used with `lag < fitdf`. It's striking nonetheless how badly the distribution is distorted in that case. Apparently, the references cited in the documentation of `tsdiag.Arima` do not expect these deleterious consequences. The documentation states: _[...] the references suggest a better approximation to the null-hypothesis distribution is obtained by setting `fitdf = p+q`, provided of course that `lag > fitdf`_. — javlacalle, Apr 24 '15 at 19:31
Yet, according to the results in my answer, the LB test shouldn't be used for `lag <= fitdf+1`, not just for `lag < fdiff`. Of course, this is a small exercise and a further insight would be necessary. — javlacalle, Apr 24 '15 at 19:31
Thanks, I really appreciate your input! Yes, a stronger warning against using `lag — Richard Hardy, Apr 24 '15 at 19:47
Thanks to you too! I learnt interesting things in this discussion. A comparison of the LB and BG tests in a general setting is still open. — javlacalle, Apr 24 '15 at 20:41

score 2 · Answer 3 · edited Jun 04 '19 at 08:04

Greene (Econometric Analysis, 7th Edition, p. 963, section 20.7.2):

"The essential difference between the Godfrey-Breusch [GB] and the Box-Pierce [BP] tests is the use of partial correlations (controlling for $X$ and the other variables) in the former and simple correlations in the latter. Under the null hypothesis, there is no autocorrelation in $e_t$, and no correlation between $x_t$ and $e_s$ in any event, so the two tests are asymptotically equivalent. On the other hand, because it does not condition on $x_t$, the [BP] test is less powerful than the [GB] test when the null hypothesis is false, as intuition might suggest."

(I know that the question asks about Ljung-Box and the above refers to Box-Pierce, but the former is a simple refinement of the latter and hence any comparison between GB and BP would also apply to a comparison between GB and LB.)

As other answers have already explained in more rigorous fashion, Greene also suggests that there is nothing to gain (other than some computational efficiency perhaps) from using Ljung-Box versus Godfrey-Breusch but potentially much to lose (the validity of the test).

score 0 · Answer 4 · edited Sep 25 '17 at 11:41

0

It seems that Box-Pierce and Ljung-Box tests are mainly univariate tests, but there are some assumptions behind the Breusch-Godfrey test when testing if linear structure is left behind on residuals of time series regression (MA or AR process).

Here is link to discussion:

http://www.stata.com/meeting/new-orleans13/abstracts/materials/nola13-baum.pdf

edited Sep 25 '17 at 11:41

Richard Hardy

54,375
10
95
219

answered Apr 24 '15 at 04:17

Analyst

2,527
10
11

I do not quite understand the meaning of the sentence because of the grammar, I think. Could you rephrase it? – Richard Hardy Sep 30 '19 at 08:05

score 0 · Answer 5 · answered Dec 05 '17 at 03:08

The main difference between the tests is the following:

The Breusch-Godfrey test is as Lagrange Multiplier test derived from the (correctly specified) likelihood function (and thus from first principles).
The Ljung-Box test is based on second moments of the residuals of a stationary process (and thus of a comparatively more ad-hoc nature).

The Breusch-Godfrey test is as Lagrange Multiplier test asymptotically equivalent to the uniformly most powerful test. Be that as it may, it is only asymptotically most powerful w.r.t. the alternative hypothesis of omitted regressors (irrespective of whether they are lagged variables or not). The strong point of the Ljung-Box test may be its power against a wide range of alternative hypotheses.

QuantumJazz · Answer 6 · 2020-10-14T15:08:36.020

Looking further in Hayashi (2000) pp 146-147:

..when the regressors are not strictly exogenous we need to modify the Q statistics to restore its asymptotic distribution

Basically we only have to assume that that the errors do not depend on the lagged regressors and they are conditionally homoskedastic.

Modifying the code of @javlacalle by (1) including fitdf=3 and (2) adding some more lags as seems reasonable in practice gives the following.

Ljung-Box test:

## Ljung-Box test
n <- 200 # number of observations
niter <- 5000 # number of iterations
LB.pvals <- matrix(nrow=niter, ncol=4)
set.seed(123)
for (i in seq_len(niter))
{
  # Generate data from an AR(3) model and store the residuals
  x <- arima.sim(n, model=list(ar=c(0.6, -0.5, 0.4)))
  resid <- residuals(arima(x, order=c(3,0,0)))
  # Store p-value of the Ljung-Box for different lag orders
  LB.pvals[i,1] <- Box.test(resid, lag=10, fitdf=3, type="Ljung-Box")$p.value
  LB.pvals[i,2] <- Box.test(resid, lag=11, fitdf=3, type="Ljung-Box")$p.value
  LB.pvals[i,3] <- Box.test(resid, lag=12, fitdf=3, type="Ljung-Box")$p.value
  LB.pvals[i,4] <- Box.test(resid, lag=13, fitdf=3, type="Ljung-Box")$p.value
}
sum(LB.pvals[,1] < 0.05)/niter
# [1] 0
sum(LB.pvals[,2] < 0.05)/niter
# [1] 0
sum(LB.pvals[,3] < 0.05)/niter
# [1] 0
sum(LB.pvals[,4] < 0.05)/niter
# [1] 0.0644
par(mfrow=c(2,2))
hist(LB.pvals[,1]); hist(LB.pvals[,2]); hist(LB.pvals[,3]); hist(LB.pvals[,4])

To me, it looks identical to the Breusch-Godfrey test simulation. In that case, and considering Hayashi's proof later in the book it seems that the Ljung-Box test is valid in presence of lagged dependent variables after all. I'm I doing wrong here?

Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey

6 Answers6

Linked