Prove that OLS residuals have 0 expectation even when the linear model is false

Question

For the sake of this question, I will consider a linear model as a statistical model such that $\mathbb{E}[y|\mathbf{x}]=\mathbf{x}\boldsymbol{\beta}$. I know that usually some more assumptions go in the definition of the linear model. However, the other properties are mostly needed for OLS estimation of the linear model, and all of them can be relaxed (see the excellent post by Glen_b here).

In my answer to the question

Estimating error - residuals vs. fitted values plot

I stated that, even when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$, the OLS residuals have expectation 0 because of the Law of Large Numbers. However, this might be wrong: if we have a random sample $S=\{(y_i,\mathbf{x}_i)\}_{i=1}^n$, then we know that $$\sum_{i=1}^ne_i=\sum_{i=1}^n(y_i-\hat{\boldsymbol{\beta}}\mathbf{x}_i)=0 \tag{1}\label{1}$$

whatever the sample size (in other words, residuals are not independent). Now, Let $e$ denote the random variable whose realizations are the $e_i$. The Law of Large Numbers would tell you that

$$\frac{\sum_{i=1}^ne_i}{n}=0 \ \ \forall n\rightarrow\mathbb{E}[e]=0$$

if the residuals were independent, but they aren't. However, property $\ref{1}$ is true because of the definitions of the OLS estimator, thus the sample mean of the residuals is always zero, even when the linear model is "false", i.e., when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$. My question is: is $\mathbb{E}[e]=0$ true, even when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$? Simulations seem to indicate that this is indeed the case.

I tried to prove it rigorously by linear algebra and rules of expectation, but I got stuck pretty soon:

$$e=y-(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty\Rightarrow \mathbb{E}[e|X]=\mathbb{E}[y|X]-(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbb{E}[y|X]$$

If I knew that $\mathbb{E}[y|\mathbf{x}]=\mathbf{x}\boldsymbol{\beta}$, then from the above I would immediately get $\mathbb{E}[e|X]=0$, which implies $\mathbb{E}[e]=0$ by the law of iterated expectations. But I can't assume that. Suggestions?

PS yes, I know that using lower-case for both random variables and their realizations is horrible, but if I use upper-case for the RV $\mathbf{X}$, then I don't know what symbol I to use for the design matrix.

You define the $e_i$ as the residuals with respect to the OLS solution. Obviously their sum is always zero: that's a mathematical property of the solution. But if the model is not what you suppose, then isn't it equally obvious that the expectation of any individual $e_i$ could be any number whatsoever? For instance, let $x_1$ be the mean of the regressors of all the other data points. Suppose the model is correct for all the other $x_i,$ $i\ne 1$, but $E(y_1)=x_1\beta+A$ for some constant $A\ne 0$. You can easily compute that this makes $E(e_1)=A/n\ne 0$. What, then, are you trying to ask? — whuber, May 30 '17 at 13:54
@whuber not sure what $x_1$ is. Is it the mean of the regressors vector? i.e., $x_1=\mathbb{E}[\mathbf{x}]=(\mu_{x_1},\dots,\mu_p)$ where $p$ is the number of regressors. — DeltaIV, May 30 '17 at 15:05
No, it's the mean of all the other regressors: they aren't (or don't have to be) considered as stochastic. Why not do a calculation with a very simple example? Try the model $(0,\beta_0+\epsilon_1),(-1,\beta_0-\beta_1+\epsilon_2),(1,\beta_0+\beta_1+\epsilon_3)$ fit *via* OLS, for instance. Compare two situations, both with independent errors $\epsilon_i$ and homogeneous variances. In the usual situation they all have zero means. In the "wrong model" alternative, suppose $\epsilon_1$ has a mean of $A$ but you are still computing the OLS estimator. It's easy to figure out $\mathbb{E}(e_1)$. — whuber, May 30 '17 at 16:04
@whuber ok, I think I understood your setting. I will try the exercise and let you know. Thanks! — DeltaIV, May 30 '17 at 18:00
I fixed a typographical error in my previous comment, in order to make all three of the $\epsilon_i$ distinct. It's intended to be the usual OLS setting with three data points, of which the first has its x-value equal to the mean of the other two x-values. By making that common mean zero, the matrix $X^\prime X=\pmatrix{3&0\\0&2}$ is easily inverted, making all your calculations simple. — whuber, May 30 '17 at 18:08

Prove that OLS residuals have 0 expectation even when the linear model is false

0 Answers0