14

Assume a non-linear regression model \begin{align} \mathbb E[y \lvert x] &= m(x,\theta) \\ y &= m(x,\theta) + \varepsilon, \end{align} with $\varepsilon := y - m(x,\theta)$.

I heard someone say that

OLS always estimates consistently the partial derivatives of the non-linear conditional expectation function evaluated (the partial derivatives) at the expected values of the regressors.

Can some please demonstrate this property?

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
Jesper for President
  • 5,049
  • 1
  • 18
  • 41

2 Answers2

15

WARNING: The results claimed in this post are of contested validity (by the writer himself. When the fog clears I will report back)

Ok. It's a bit long to include the whole proof here, so I will just sketch:

Apply a first-order Taylor expansion around some, initially arbitrary point, $x_0$,

$$y = m(x_0) + [x-x_0]'\nabla m(x_0,\theta) + R_1 + \epsilon.$$

where $R_1$ is the Taylor remainder. Set $$b_0 = m(x_0),\; b = \nabla m(x_0,\theta),\;\beta = (b_o, b)' $$

$$\tilde x = x-x_0,\; u = R_1 + \epsilon$$ and revert to matrix notation

$$\mathbf y = \tilde X \beta + \mathbf u.$$

So what the OLS will attempt to estimate is the gradient of the conditional expectation function, evaluated at some point $x_0$, and the constant term will attempt to estimate the CEF evaluated at that point $x_0$.

The OLS will be

$$\hat \beta = \beta + (\tilde X'\tilde X)^{-1}\tilde X'u \implies \hat \beta - \beta = (\tilde X'\tilde X)^{-1}\tilde X'(\epsilon + R_1)$$

Since $\epsilon$ is by construction the conditional expectation function error, at the limit we will be left with

$$\text{plim}(\hat \beta - \beta) =E(\tilde x\tilde x')\cdot E(\tilde x\cdot R_1)$$

Now, $R_1$ will depend on the choice of $x_0$. Since $R_1$ represents the inaccuracy of the linear approximation, a natural thought is "what center of expansion minimizes the expected square Taylor remainder $E(R_1^2)$?" So that the linear approximation is deemed "best" under a criterion that mimics "Mean squared error", which is a well-known and widely used optimality criterion as regards deviations in general?

If one follows this path, one will find that setting $x_0 = E(x)$ minimizes $E(R_1^2)$ if the gradient of the CEF is estimated by OLS. Moreover, one finds that in such a case, $E(\tilde x\cdot R_1) = 0$. QED

Implementing this in practice means centering the regressors on their sample mean, while leaving the dependent variable uncentered.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • Coll thx alot if you happen to stumble over some interesting references you are welcome to add then when you have the time. Thx again :) – Jesper for President Feb 06 '20 at 21:22
  • 2
    @StopClosingQuestionsFast If you mean references on the above result, I haven't find it anywhere, this is why I have submitted it as a paper for peer-review and publication as a nice little interesting result. – Alecos Papadopoulos Feb 06 '20 at 21:25
  • Ahh ok, well let me know when a public working paper is available :) good luck. I wonder whether something like it could not be found in Li and Racine's book on non-parametric estimation. – Jesper for President Feb 06 '20 at 21:26
0

@alecos papadopoulous

The very last statement of your sketch of proof seems a little bit too optmistic. I really do not understand how you can proove that $E[\tilde{x}R_1]=0$.

A majorization of $E[\tilde{x}R_1]$ can be founded if for instance some upper bound to the modulus of the second derivative of $E[Y|X=x]$ is known (see for instance Bera 1984 criticizing a well known paper by white 1980 in International Economic Review), but I do not understand how a general result oabout asymptotic consistency of the OLS estimator towards the derivative of the conditional expectation at some point can be founded.

doc fjs
  • 1
  • 1