Is there any rule of thumb or optimization technique for number of grid points for Kernel Regression?
I am doing Nadaraya-Watson on 10 years data (2500 daily observations) of Swap rate. While performing cross-validation for optimal bandwidth selection, results are, that less grid points = higher optimal bandwidth, e.g. 50 points => h =10, 1000 points, h = 2. I was checking grids from list(range(50, 1001, 100)), where 50 is starting point, 1001 end point and 100 step. 50 means that grid consists of 50 points.