1

Suppose $E(Y|x)=m(x)$ is the regression function that is twice differentiable, $f(x)$ is the density of $X$ that is also twice differentiable. Suppose $Y_i=m(X_i)+e_i$.

$m'(x)$ is the derivative of regression function.

We have i.i.d. data $\{Y_i,X_i\}_{i=1}^{N}$ and the Nadaraya-Watson kernel estimate of $m(x)$ is $\widehat{m}(x)=\frac{\frac{1}{N}\sum_{i=1}^{N}Y_iK_{h}(X_i-x)}{\frac{1}{N}\sum_{i=1}^{N}K_{h}(X_i-x)}$, where $K_h(\cdot)=\frac{1}{h}k(\cdot)$, and $k(\cdot)$ is a standard second order kernel. Let $\widehat{m}'(x)$ is its derivative wrt $x$, which is the estimator for $m'(x)$. What's the uniform (in probability) rate of convergence of $\widehat{m}'(x)$ to $m'(x)$ on a compact subset of support? Or formally, let $A$ be a compact subset for the support of $X$, and let $sup_{x\in A}|\widehat{m}'(x)-m'(x)|=O_p(a_n)$, what is $a_n$?

This looks like some standard results, but I can hardly find a paper on it.

I do found some related results, that is if we define $\widehat{g}(x)=\frac{1}{N}\sum_{i=1}^{N}Y_iK_{h}(X_i-x)$ and $\widehat{f}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(X_i-x)$, so that $\widehat{m}(x)=\frac{\widehat{g}(x)}{\widehat{f}(x)}$ and $\widehat{m}'(x)=\frac{\widehat{g}'(x)\widehat{f}(x)-\widehat{g}(x)\widehat{f}'(x)}{\widehat{f}^2(x)}$, I know the following uniform rates:

$sup_{x\in A}|\widehat{g}(x)-g(x)|=sup_{x\in A}|\widehat{f}(x)-f(x)|=O_p(\sqrt{\frac{logN}{Nh}}+h^2)$

$sup_{x\in A}|\widehat{g}'(x)-g(x)|=sup_{x\in A}|\widehat{f}'(x)-f(x)|=O_p(\sqrt{\frac{logN}{Nh^{1+2}}}+h^2)$

My question is, from these two equations, can I conclude that

$sup_{x\in A}|\widehat{m}'(x)-m(x)|=O_p(\sqrt{\frac{logN}{Nh^{1+2}}}+h^2)$?

Why or why not? Any comments, suggestions, or related reference are welcome! Thanks

T34driver
  • 1,608
  • 5
  • 11

1 Answers1

1

...can I conclude that $\sup_{x\in A}$ $|\widehat{m}'(x)-m'(x)|=O_p(\sqrt{\frac{logN}{Nh^{1+2}}}+h^2)$?

Yes, you can, provided the denominator of $\widehat{m}'(x)$ does not blow up---e.g. if you assume $\hat{f}^2$ is bounded away from zero almost surely. (A sufficient primitive condition for this would be that the design density $f$ has range bounded away from zero. More generally, on any region of the support where $f > \delta > 0$ for some $\delta > 0$, you're fine.)

Comment

It's probably easier to fix a rate for the bandwidth. For example, one can take the bandwidth that is optimal for the derivatives $f'$ and $g'$, which is (from minimizing the expression you provided) $$ h = O(N^{-\frac{5}{14}} (\log N)^{\frac{5}{14}} ). $$

(The optimal bandwidth for the design functions $m$ would be $h = O(N^{-\frac{1}{5}})$. There may be trade-offs between the rates for $m$ and $m'$. )

Michael
  • 2,853
  • 10
  • 15
  • Thanks, Michael! This is very helpful. Intuitively, is it because $\widehat{m}'(x)$ is a smooth function of $\widehat{g}(x),\widehat{f}(x),\widehat{g}'(x),\widehat{f}'(x) $ when $\widehat{f}^2$ is bounded away from zero almost surely? And more generally, is it true that whenever we have some $\hat{F}=G(\hat{p}(x))$ and $G(\cdot)$ smooth, then $\hat{F}$ will preserve the uniform rate of $\hat{p}(x)$, like in Delta-method (the root n rate of sample mean is preserved after smooth transformation)? – T34driver Oct 04 '20 at 23:57
  • 1
    @T34driver Maybe I should rephrase. Since $\hat{f}$ is a uniformly consistent estimate of $f$, on any region on the support of $f$ where $f > \delta > 0$ for some $\delta > 0$, you're fine. Divided by $f^2(x)$ that can be arbitrarily close to zero is a problem---even if we substitute in the true density $f$ for $\hat{f}$. To make connection with the Delta method, if you stay away from zero, the function $\frac{1}{x^2}$ is Lipshitz but blows up, along with its derivative, as $x$ approaches zero. – Michael Oct 05 '20 at 01:28
  • Thanks! Indeed, like in Delta method, here smoothness and non-blowing up, are both important for preserving the uniform rate. – T34driver Oct 05 '20 at 01:41