Estimating mean in the presence of serial correlation

Question

Consider the following generating equation:

\begin{equation} X_{d+1} = a X_d + b + {\cal E}_d \end{equation}

where $a$ and $b$ are constants with $0 <a < 1$ and $b > 0$. Further let ${\cal E}_d$ be independent of everything else with $E[{\cal E}_d] = 0$ and $E[{\cal E}_d^2] = \sigma^2 ~~\forall d$.

We are interested in the values of $X_d$ only after the sequence hits "steady state" and so ignore the starting conditions (so for simplicity, assume that the equation "starts at $d=-\infty$").

Easy to see that $E[X] = \frac{b}{1-a}$.

Now say we have $N$ consecutive samples $X_1,\cdots, X_N$ of $X$ and want to compute $E[X]$. We have two options:

Option A: Compute the simple average of $X_1, \cdots, X_N$. So $E[X] \approx \frac{X_1+\cdots + X_N}{N}$.
Option B: Run an OLS regression using generation equation above to compute the values of $a$ and $b$ and then compute the mean using $E[X] = \frac{b}{1-a}$.

My questions:

What are the pros and cons of each approach?
I think the second option is superior and results in an estimator with lower variance. Is this true and if so how can I prove this?

Thomas Lumley · Accepted Answer · 2020-11-30T00:07:46.913

Simulation shows very similar efficiency for these estimators, even when $a=0.99$.

> a<-0.99
> b<-10
> r<-replicate(1000,{
+ x<-numeric(1000+2000)
+ x[1]<-rnorm(1)
+ for(i in 2:3000) x[i]<-x[i-1]*a+b+rnorm(1)
+ x<-x[-(1:1000)]
+ m<-lm(x[-1]~x[-2000])
+ c(coef(m)[1]/(1-coef(m)[2]),mean(x))
+ })

> apply(r,1,mean)
(Intercept)             
   1000.091    1000.068 
> apply(r,1,sd)
(Intercept)             
   2.289759    2.209923

Setting $\sigma^2$ be small doesn't help, either: with $\sigma^2=0.01$

> apply(r,1,mean)
(Intercept)             
   999.9974    999.9953 
> apply(r,1,sd)
(Intercept)             
  0.2182076   0.2109398

The other thing that shows up in simulation is that the correlation between the two estimators increases with the length of the simulation. It is 0.972 for the simulation above, with 2000 points, but 0.998 for 20000 points.

So, it looks possible that the estimators are the same (for infinite time).

Now consider the OLS regression. Define $z_d =x_{d-1}-\bar x_{[1:(n-1)}$ and $y_d=x_d-\bar x_{[2:n]}$, ie, center the two variables in the regression about their respective sample means. Write $\hat\beta$ for the intercept and $\hat\alpha$ for the fitted intercept and slope in this new regression.

We have $\hat\beta=0$ and $\hat\alpha=\hat a$, just as a fact about OLS. So the estimator $\hat\beta/(1-\hat\alpha)$ for the mean of $y$ is identical to the estimator $\bar y$ as both are identically zero.

Reparametrising back to the original scale, we shift the intercept to the left by $\bar x_{[2:n]}$ and up by $\bar x_{[1:(n-1)}$ to

$$\hat b= \hat\beta+\bar x_{[1:(n-1)} -\hat \alpha\times\bar x_{[2:n]}$$

For large $n$, the two means are approximately the same, so $$\hat b= (1-\hat a)\times \bar x_n +O_p(n^{-1})$$ and the estimated mean is $$\frac{\hat b}{1-\hat a}=\bar x_n +O_p(n^{-1})$$

The two estimators are asymptotically equivalent (to first order) for large $n$.

I will just conclude by noting this is the sort of result that is quite hard to prove if you can't imagine that it might be true, and simulation is a good way to come to imagine that it might be true.

UPDATE

In the special case where the errors are Normal, we can also look at maximum likelihood estimation. The standard results for independent data don't imply efficiency, but we can expect at least pretty good efficiency.

Consider the model $X\sim N(\mu, \Xi)$, where $\mu$ is the mean we're interested in and $\Xi$ is AR-1 covariance model implied by the generating equation. The deviance is $$d= -2\ell(\mu,\Xi)= \log |Xi| +(x-\mu)^T\Xi^{-1}(x-\mu)$$

Differentiating wrt $\mu$ gives $$0 = -1^T\Xi^{-1}(y-\mu) - (y-\mu)^T\Xi^{-1}1$$ Write $\xi^{ij}$ for the $(i,j)$ element of $\Xi^{-1}$, and we have

$$ \hat\mu = \frac{\sum_{i,j}\xi^{ij}x_i}{\sum_{i,j}\xi^{ij}}$$

Now, except for edge effects, $\xi^{ij}$ depends only on $|i-j|$. In fact, $\Xi^{-1}$ is tridiagonal, so $\xi^{ij}$ is non-zero only when $|i-j|\leq 1$. So (again, up to edge effects) $$\sum_{i,j=1}^n\xi^{ij}x_i\approx\sum_{k=1}^n \sum_{l=-1}^1\xi^{k,k+l}x_k\approx \sum_{k=1}^n \left(\sum_{l=-1}^1\xi^{k,k+l}\right)x_k$$ Now, $\left(\sum_{l=-1}^1\xi^{k,k+l}\right)$ is constant in $k$ (except for edge effects). Call it $A$. $$\hat\mu\approx\frac{A\sum_i x_i}{An}=\bar x$$

So the MLE is also asymptotically equivalent to the sample average, to first order.

Nice approach. Just one question. To me, it seems like, since the error term is independent, then there is no serial correlation problem and so no problem with doing ordinary OLS. Is that correct ? Thanks. — mlofton, Nov 29 '20 at 05:06
Nice approach! Given that we do have some information about the generation process, isn't there a more efficient estimator for the mean? Simply taking the average of samples doesn't use this additional information. It seems very surprising that this additional information is not useful. Maybe trying to use an OLS estimator for `a` and `b` was a bad idea? Is there any other way? — gauss, Nov 29 '20 at 05:25
For people reading this thread in the future, I was wrong ( in my comment above ) in that OLS does produce a biased $\beta$ for the AR(1). The OP is correct in that there is serial correlation but what happens is kind of complicated and explained at this thread. https://stats.stackexchange.com/questions/240383/why-is-ols-estimator-of-ar1-coefficient-biased — mlofton, Nov 30 '20 at 18:25

Estimating mean in the presence of serial correlation

1 Answers1