5

I would like to approximate a function $f:\mathbb{R} \to \mathbb{R}_+$ based on a set of samples. I obtain these samples online (i.e. sequentially in time). That is, at time $t$ I receive $(x_t, f(x_t))$ and I would like to update my approximation of $f$.

  • Can you please point me to papers / resources describing Gaussian Processes for estimating a function online?
  • In addition, can you point me to resources to do this online?
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

1 Answers1

6

This is pretty straightforward to do with Bayesian learning since it corresponds to sequentially updating the posterior over $f$ as more and more data comes in. Bayesian optimization uses this a lot so that's one application to look at for this kind of thing.

$\newcommand{\f}{\mathbf f}$$\newcommand{\one}{\mathbf 1}$Let's say we start with $(x_1, f(x_1)), \dots, (x_n, f(x_n))$. Our prior is $f \sim \mathcal{GP}(m, k)$ and for simplicity I'll assume $m(x) = \mu$ for all $x$, i.e. the mean function is constant. We then have $$ \left(\begin{array}{c} f_1 \\ \vdots \\ f_n\end{array}\right) := \f_n \sim \mathcal N(\mu\one_n, K_n + \alpha I) $$ where I'm adding $\alpha I$ with $\alpha > 0$ small to $K_n$ to condition it better. For a new point $x_*$ with value $f_* = f(x_*)$ we'll have $$ {\f_n \choose f_*} \sim \mathcal N\left(\mu\one_{n+1}, \left[\begin{array}{c|c}K_n + \alpha I & k_* \\ \hline k_*^T & k_{**}\end{array}\right] \right) $$ (I'm using the common notation of $k_* = (k(x_1, x_*), \dots, k(x_n, x_*))^T$ and $k_{**} = k(x_*, x_*)$) so the conditional mean of this new point given the $n$ that we've already observed is $\text E[f_* \mid \f_n] = \mu + k_*^T(K_n + \alpha I_n)^{-1}(\f_n - \mu\one_n)$. This conditional mean will be our approximation to $f$ after our first $n$ samples so $$ \hat f_n(x) = \mu + k_*(x)^T(K_n + \alpha I_n)^{-1}(\f_n - \mu\one_n). $$

Now when we get a new observation $(x_{n+1}, f(x_{n+1}))$ we will update our understanding of $f_*$ by adding this to $\f_n$ to get $\f_{n+1}$ and augmenting $K_n$ into $K_{n+1}$ so now for a new point $(x_*, f_*)$ we have $$ \hat f_{n+1}(x) = \mu + k_*(x)^T(K_{n+1} + \alpha I_{n+1})^{-1}(\f_{n+1}- \mu\one_{n+1}) $$ and the process continues. We've moved from conditioning on $n$ things to conditioning on $n+1$ things.


We can also save some time with these inverses by using $K_n^{-1}$ in the computation of $K_{n+1}^{-1}$ so we can compute them recursively. I'll drop the $\alpha I$ from this part for cleaner notation but it can be added back in with no issue. We can treat $K_{n+1}$ as a 2x2 block matrix so we can use the formula for such a matrix's inverse. Precomputing $v_n = K_n^{-1}k_*$ and the inverse of the relevant (1x1) Schur compliment $S_n := (k_{**} - k_*^Tv_n)^{-1}$, we have $$ K_{n+1}^{-1} = \begin{bmatrix} K_n & k_* \\ k_*^T & k_{**}\end{bmatrix}^{-1} = \begin{bmatrix}K_n^{-1} + S_n v_nv_n^T & -S_n v_n \\ -S_nv_n^T & S_n\end{bmatrix} . $$ Given $K_n^{-1}$ and $k_*$, computing $v_n$ is $\mathcal O(n^2)$ and once we have that $S_n$ is $\mathcal O(n)$. $K_n^{-1} + S_nv_nv_n^T$ is additionally $\mathcal O(n^2)$ so overall getting $K_{n+1}^{-1}$ from $K_n^{-1}$ is $\mathcal O(n^2)$. This is better than the best known times for explictly inverting $K_{n+1}^{-1}$ from scratch.

jld
  • 18,405
  • 2
  • 52
  • 65