WARNING: The results claimed in this post are of contested validity (by the writer himself. When the fog clears I will report back)
Ok. It's a bit long to include the whole proof here, so I will just sketch:
Apply a first-order Taylor expansion around some, initially arbitrary point, $x_0$,
$$y = m(x_0) + [x-x_0]'\nabla m(x_0,\theta) + R_1 + \epsilon.$$
where $R_1$ is the Taylor remainder. Set
$$b_0 = m(x_0),\; b = \nabla m(x_0,\theta),\;\beta = (b_o, b)' $$
$$\tilde x = x-x_0,\; u = R_1 + \epsilon$$
and revert to matrix notation
$$\mathbf y = \tilde X \beta + \mathbf u.$$
So what the OLS will attempt to estimate is the gradient of the conditional expectation function, evaluated at some point $x_0$, and the constant term will attempt to estimate the CEF evaluated at that point $x_0$.
The OLS will be
$$\hat \beta = \beta + (\tilde X'\tilde X)^{-1}\tilde X'u \implies \hat \beta - \beta = (\tilde X'\tilde X)^{-1}\tilde X'(\epsilon + R_1)$$
Since $\epsilon$ is by construction the conditional expectation function error, at the limit we will be left with
$$\text{plim}(\hat \beta - \beta) =E(\tilde x\tilde x')\cdot E(\tilde x\cdot R_1)$$
Now, $R_1$ will depend on the choice of $x_0$. Since $R_1$ represents the inaccuracy of the linear approximation, a natural thought is "what center of expansion minimizes the expected square Taylor remainder $E(R_1^2)$?" So that the linear approximation is deemed "best" under a criterion that mimics "Mean squared error", which is a well-known and widely used optimality criterion as regards deviations in general?
If one follows this path, one will find that setting $x_0 = E(x)$ minimizes $E(R_1^2)$ if the gradient of the CEF is estimated by OLS. Moreover, one finds that in such a case, $E(\tilde x\cdot R_1) = 0$. QED
Implementing this in practice means centering the regressors on their sample mean, while leaving the dependent variable uncentered.