Suppose $Y_i=g(X_i)+e_i$ with $E(e_i|X_i)=0$, $g(\cdot)$ being an unknown function and $X_i\in S=\{1,2,3,4\}$ with equal probability of taking each value. We want to estimate $g(x)$ using data $\{Y_i,X_i\}_{i=1}^{n}$. Can we estimate $g(x)$ with the Nadaraya–Watson estimator
$\widehat{g}(x)=\frac{\sum_{i=1}^{n}Y_iK_{h}(X_i-x)}{\sum_{i=1}^{n}K_{h}(X_i-x)}$,
where $x\in S$ and $K_{h}(\cdot)=\frac{1}{h}k(\cdot)$ and $k(\cdot)$ is some standard second order kernel?
More specifically, is $\widehat{g}(x)$ still consistent for $g(x)$? Thanks!
My guess is it's still consistent, as eventually bandwidth shrinks and we still put all weight on point $x$, which is the same as in the case when $x$ is continuous.