Suppose $E(Y|x)=m(x)$ is the regression function that is twice differentiable, $f(x)$ is the density of $X$ that is also twice differentiable. Suppose $Y_i=m(X_i)+e_i$.
$m'(x)$ is the derivative of regression function.
We have i.i.d. data $\{Y_i,X_i\}_{i=1}^{N}$ and the Nadaraya-Watson kernel estimate of $m(x)$ is $\widehat{m}(x)=\frac{\frac{1}{N}\sum_{i=1}^{N}Y_iK_{h}(X_i-x)}{\frac{1}{N}\sum_{i=1}^{N}K_{h}(X_i-x)}$, where $K_h(\cdot)=\frac{1}{h}k(\cdot)$, and $k(\cdot)$ is a standard second order kernel. Let $\widehat{m}'(x)$ is its derivative wrt $x$, which is the estimator for $m'(x)$. What's the uniform (in probability) rate of convergence of $\widehat{m}'(x)$ to $m'(x)$ on a compact subset of support? Or formally, let $A$ be a compact subset for the support of $X$, and let $sup_{x\in A}|\widehat{m}'(x)-m'(x)|=O_p(a_n)$, what is $a_n$?
This looks like some standard results, but I can hardly find a paper on it.
I do found some related results, that is if we define $\widehat{g}(x)=\frac{1}{N}\sum_{i=1}^{N}Y_iK_{h}(X_i-x)$ and $\widehat{f}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(X_i-x)$, so that $\widehat{m}(x)=\frac{\widehat{g}(x)}{\widehat{f}(x)}$ and $\widehat{m}'(x)=\frac{\widehat{g}'(x)\widehat{f}(x)-\widehat{g}(x)\widehat{f}'(x)}{\widehat{f}^2(x)}$, I know the following uniform rates:
$sup_{x\in A}|\widehat{g}(x)-g(x)|=sup_{x\in A}|\widehat{f}(x)-f(x)|=O_p(\sqrt{\frac{logN}{Nh}}+h^2)$
$sup_{x\in A}|\widehat{g}'(x)-g(x)|=sup_{x\in A}|\widehat{f}'(x)-f(x)|=O_p(\sqrt{\frac{logN}{Nh^{1+2}}}+h^2)$
My question is, from these two equations, can I conclude that
$sup_{x\in A}|\widehat{m}'(x)-m(x)|=O_p(\sqrt{\frac{logN}{Nh^{1+2}}}+h^2)$?
Why or why not? Any comments, suggestions, or related reference are welcome! Thanks