Your question is similar to inducing point approximations.
Famous/popular techniques are:
Basic GP
You are given a training set $\mathcal{D} = \{(\mathbf{x}_i,y_i)\}_{1\leq i\leq n}$ with inputs $\mathbf{x}_i$ being (possibly high-dimensional) vectors and (noisy) scalar outputs $y_i$. You are given the task of predicting new outputs $\mathbf{f}_\star$ (or a noisy output $\mathbf{y}_\star$) given new inputs $\mathbf{X}_\star = [\mathbf{x}_{1\star}, \dots, \mathbf{x}_{n\star}]$. The model is
$$ \forall i\,,\quad y_i = f(\mathbf{x}_i) + \varepsilon_i\;,\quad \varepsilon_i\sim\mathcal{N}(0,\sigma_\mathrm{n}^2)$$
A GP prior over a function $f$ implies that given a sequence of inputs $\mathbf{X} = [\mathbf{x}_1, \dots, \mathbf{x}_n]$ and the corresponding function values $\mathbf{f} = [\mathrm{f}_1, \dots, \mathrm{f}_n] = [f(\mathbf{x}_1),\dots,\, f(\mathbf{x}_n)]$ we have
$$\mathbb{P}(\mathbf{f}\mid \mathbf{X}) = \mathcal{N}(\mathbf{0},\mathbf{K})$$
with a kernel matrix $\mathbf{K}$ which characterizes the covariance as $\mathbf{K}_{ij} = k(\mathbf{x}_i,\mathbf{x}_j)$. Skipping the usual derivations in the GP inference process, we usually obtain that the predictive distribution is
$$\mathbb{P}(\mathbf{f}_\star\mid \mathbf{X}_\star, \mathbf{X},\mathbf{y}) = \mathcal{N}(\pmb{\mu}_\star, \pmb{\Sigma}_\star)$$
with explicit form
$$\begin{cases}
\pmb{\mu}_\star =& \mathbf{K}_{\star,\mathbf{f}}\left(\mathbf{K}_{\mathbf{f},\mathbf{f}}+\sigma_\mathrm{n}^2\mathbf{I}\right)^{-1}\mathbf{y}\\
\pmb{\Sigma}_\star =&\mathbf{K}_{\star,\star} - \mathbf{K}_{\star,\mathbf{f}}\left(\mathbf{K}_{\mathbf{f},\mathbf{f}}+\sigma_\mathrm{n}^2\mathbf{I}\right)^{-1}\mathbf{K}_{\mathbf{f},\star}
\end{cases}$$
But as you know inverting the $n\times n$ matrix $\mathbf{K}_{\mathbf{f},\mathbf{f}}+\sigma_\mathrm{n}^2\mathbf{I}$ takes complexity of $O(n^3)$ which becomes prohibitive for $n \geq 10^3$
Inducing points
Denote latent variables $\mathbf{u} = [\mathrm{u}_1,\dots,\mathrm{u}_m]$ which are called inducing variables. They are output values corresponding to a set of input locations $\mathbf{X}_\mathbf{u}$ which are often called "pseudo-inputs" or "active set". The idea is to decompose the joint GP prior as
$$\mathbb{P}(\mathbf{f},\mathbf{f}_\star) = \int\mathbb{P}(\mathbf{f}_\star, \mathbf{f}, \mathbf{u})\,\mathrm{d}\mathbf{u} = \int\mathbb{P}(\mathbf{f}_\star, \mathbf{f}\mid \mathbf{u})\,\mathbb{P}(\mathbf{u})\mathrm{d}\mathbf{u}$$
with $\mathbb{P}(\mathbf{u}) = \mathcal{N}(\mathbf{0},\mathbf{K}_{\mathbf{u},\mathbf{u}})$ and impose the simplification of conditional independence $\mathbf{f}_\star\perp\mathbf{f}\mid \mathbf{u}$. This simplification means that
$$ \mathbb{P}(\mathrm{f}_\star, \mathrm{f}, \mathbf{u}) \approx \mathbb{Q}(\mathrm{f}_\star \!\mid\mathbf{u})\mathbb{Q}(\mathrm{f} \!\mid\mathbf{u})\mathbb{P}(\mathbf{u})$$
The essence of this trick is that instead of inverting a $n\times n$ matrix you only invert the matrix on the inducing points $\mathbf{K}_{\mathbf{u},\mathbf{u}}$ which is much smaller ($m\times m$ with $m \ll n$).
There are two important questions to construct a good approximation by only using $m$ points:
- How do we pick the inducing points $\mathbf{u}$ ?
- How do we pick the distribution approximation $\mathbb{Q}$ ?
Another question, perhaps equivalent to the second one is:
- How do we approximate the covariance function $k$ ?
In the Candela & Rasmussen paper there's an explanation for SoR and FITC.
Subset of Regressors
The SoR method can be viewed as the following kernel approximation:
$$ {k}(\mathbf{x}, \mathbf{z}) \approx \tilde{k}(\mathbf{x}, \mathbf{z}) = k(\mathbf{x},\mathbf{u})\mathbf{K}_{\mathbf{u},\mathbf{u}}^{-1}k(\mathbf{u},\mathbf{z})$$
You can pick the inducing points on a uniform grid to start. Do spend time to experiment with each method to decide which one is better in your situation.
Implementation
In python you have the possibility of using GPytorch which also supports GP inference on GPU. I am not yet aware of any similar package for the R language.