This CrossValidated question was answered with the statement that "splines are just a special case of Gaussian Process regression". I like to show this equivalence in practice, but fail to do so. Can we create an example with real data to show that a GP trend, and a spline trend, yield the same result?
I tried it using the Python libraries scipy
to generate splines, and scikit-learn
for a GP with a radial basis function (RBF) kernel, also known as a squared-exponential kernel. Splines are supposedly a special case of GPs with this kernel.
First, I generate synthetic test data. A sine as an example for a simple variation, plus some noise:
import numpy as np
np.random.seed(1)
points = 200
x = np.linspace(0, 1, points)
y = np.sin(x * 20)
noise = np.random.normal(0, 0.1, points)
y += noise
The GP trend:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
length = 0.1
kernel = RBF(length, (length, length))
GP = GaussianProcessRegressor(kernel).fit(x.reshape(-1, 1), y)
trend_GP = GP.predict(x.reshape(-1, 1))
For the spline, we need to set the smoothing parameter so that both curves match. By method of least squares minimization, I found s=1.43
to be closest in this example.
from scipy.interpolate import UnivariateSpline
spline = UnivariateSpline(x, y, s=1.43, k=3)
trend_spline = spline(x)
Now we can plot the scatter data and both curves.
import matplotlib.pyplot as plt
plt.scatter(x, y, s=3, color='black')
plt.plot(x, trend_GP, color='blue')
plt.plot(x, trend_spline, color='red')
plt.show()
The curves are similar, but not identical:
plt.plot(x, trend_GP - trend_spline, color='black')
plt.show()