For the sake of this question, I will consider a linear model as a statistical model such that $\mathbb{E}[y|\mathbf{x}]=\mathbf{x}\boldsymbol{\beta}$. I know that usually some more assumptions go in the definition of the linear model. However, the other properties are mostly needed for OLS estimation of the linear model, and all of them can be relaxed (see the excellent post by Glen_b here).
In my answer to the question
Estimating error - residuals vs. fitted values plot
I stated that, even when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$, the OLS residuals have expectation 0 because of the Law of Large Numbers. However, this might be wrong: if we have a random sample $S=\{(y_i,\mathbf{x}_i)\}_{i=1}^n$, then we know that $$\sum_{i=1}^ne_i=\sum_{i=1}^n(y_i-\hat{\boldsymbol{\beta}}\mathbf{x}_i)=0 \tag{1}\label{1}$$
whatever the sample size (in other words, residuals are not independent). Now, Let $e$ denote the random variable whose realizations are the $e_i$. The Law of Large Numbers would tell you that
$$\frac{\sum_{i=1}^ne_i}{n}=0 \ \ \forall n\rightarrow\mathbb{E}[e]=0$$
if the residuals were independent, but they aren't. However, property $\ref{1}$ is true because of the definitions of the OLS estimator, thus the sample mean of the residuals is always zero, even when the linear model is "false", i.e., when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$. My question is: is $\mathbb{E}[e]=0$ true, even when $\mathbb{E}[y|\mathbf{x}]\neq\mathbf{x}\boldsymbol{\beta}$? Simulations seem to indicate that this is indeed the case.
I tried to prove it rigorously by linear algebra and rules of expectation, but I got stuck pretty soon:
$$e=y-(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty\Rightarrow \mathbb{E}[e|X]=\mathbb{E}[y|X]-(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbb{E}[y|X]$$
If I knew that $\mathbb{E}[y|\mathbf{x}]=\mathbf{x}\boldsymbol{\beta}$, then from the above I would immediately get $\mathbb{E}[e|X]=0$, which implies $\mathbb{E}[e]=0$ by the law of iterated expectations. But I can't assume that. Suggestions?
PS yes, I know that using lower-case for both random variables and their realizations is horrible, but if I use upper-case for the RV $\mathbf{X}$, then I don't know what symbol I to use for the design matrix.