OLS as approximation for non-linear function

Question

Assume a non-linear regression model \begin{align} \mathbb E[y \lvert x] &= m(x,\theta) \\ y &= m(x,\theta) + \varepsilon, \end{align} with $\varepsilon := y - m(x,\theta)$.

I heard someone say that

OLS always estimates consistently the partial derivatives of the non-linear conditional expectation function evaluated (the partial derivatives) at the expected values of the regressors.

Can some please demonstrate this property?

Alecos Papadopoulos · Accepted Answer · 2021-03-27T22:06:45.893

WARNING: The results claimed in this post are of contested validity (by the writer himself. When the fog clears I will report back)

Ok. It's a bit long to include the whole proof here, so I will just sketch:

Apply a first-order Taylor expansion around some, initially arbitrary point, $x_0$,

$$y = m(x_0) + [x-x_0]'\nabla m(x_0,\theta) + R_1 + \epsilon.$$

where $R_1$ is the Taylor remainder. Set $$b_0 = m(x_0),\; b = \nabla m(x_0,\theta),\;\beta = (b_o, b)' $$

$$\tilde x = x-x_0,\; u = R_1 + \epsilon$$ and revert to matrix notation

$$\mathbf y = \tilde X \beta + \mathbf u.$$

So what the OLS will attempt to estimate is the gradient of the conditional expectation function, evaluated at some point $x_0$, and the constant term will attempt to estimate the CEF evaluated at that point $x_0$.

The OLS will be

$$\hat \beta = \beta + (\tilde X'\tilde X)^{-1}\tilde X'u \implies \hat \beta - \beta = (\tilde X'\tilde X)^{-1}\tilde X'(\epsilon + R_1)$$

Since $\epsilon$ is by construction the conditional expectation function error, at the limit we will be left with

$$\text{plim}(\hat \beta - \beta) =E(\tilde x\tilde x')\cdot E(\tilde x\cdot R_1)$$

Now, $R_1$ will depend on the choice of $x_0$. Since $R_1$ represents the inaccuracy of the linear approximation, a natural thought is "what center of expansion minimizes the expected square Taylor remainder $E(R_1^2)$?" So that the linear approximation is deemed "best" under a criterion that mimics "Mean squared error", which is a well-known and widely used optimality criterion as regards deviations in general?

If one follows this path, one will find that setting $x_0 = E(x)$ minimizes $E(R_1^2)$ if the gradient of the CEF is estimated by OLS. Moreover, one finds that in such a case, $E(\tilde x\cdot R_1) = 0$. QED

Implementing this in practice means centering the regressors on their sample mean, while leaving the dependent variable uncentered.

Coll thx alot if you happen to stumble over some interesting references you are welcome to add then when you have the time. Thx again :) — Jesper for President, Feb 06 '20 at 21:22
@StopClosingQuestionsFast If you mean references on the above result, I haven't find it anywhere, this is why I have submitted it as a paper for peer-review and publication as a nice little interesting result. — Alecos Papadopoulos, Feb 06 '20 at 21:25
Ahh ok, well let me know when a public working paper is available :) good luck. I wonder whether something like it could not be found in Li and Racine's book on non-parametric estimation. — Jesper for President, Feb 06 '20 at 21:26

score 0 · Answer 2 · answered Mar 26 '21 at 09:15

@alecos papadopoulous

The very last statement of your sketch of proof seems a little bit too optmistic. I really do not understand how you can proove that $E[\tilde{x}R_1]=0$.

A majorization of $E[\tilde{x}R_1]$ can be founded if for instance some upper bound to the modulus of the second derivative of $E[Y|X=x]$ is known (see for instance Bera 1984 criticizing a well known paper by white 1980 in International Economic Review), but I do not understand how a general result oabout asymptotic consistency of the OLS estimator towards the derivative of the conditional expectation at some point can be founded.

OLS as approximation for non-linear function

2 Answers2

Linked