If you want to do kernel regression or Gaussian process regression you can use alternative approaches. They are better as they allow direct optimization instead of Grid search. I suggest you to try two approaches:
Maximum likelihood estimate. If you have a sample $D = (X, \mathbf{y}) = \{(x_i, y_i)\}_{i = 1}^n$ for inputs $x_i \in \mathbb{R}^d$ and output $y_i \in \mathbb{R}$ then you can compute kernel matrix: $K = \{k_{ij}\}_{i, j = 1}^n$ with $k_{ij} = k_{\sigma}(x_i, x_j)$. Then the log likelihood is:
$$
L(D, \sigma) = -1/2 \left[det(K) + \mathbf{y}^T K^{-1} \mathbf{y} \right].
$$
If you maximize $L(D, \sigma)$ w.r.t. $\sigma$ you get a maximum likelihood estimate for $\sigma$ for Gaussian process regression. The likelihood function has derivatives and all that staff, and optimization problem is pretty good (but not convex).
LOO cross-validation. For kernel methods you can calculate leave-one-out cross validation faster. See the question Gaussian process regression: leave-one-out prediction
Note, that for both these approach we need to inverse $K$, so the computational complexity of each optimization step is $O(n^3)$.
However, there are some faster approximation calculation see e.g. Random Fourier Features - an easy to implement approach that allows you to handle high computational complexity. In this case complexity scales as $m^2 n$, where $m$ is a parameter. Нou can use $m$ significantly smaller than $n$.