5

I am trying to find a procedure to fit data monotonically in Python.

The data won’t be necessarily monotonic. I just would like to achieve a monotonic fit because of theoretical assumptions.

I imagine that a way of doing that would be to run an isotonic regression and then interpolate using a cubic spline. Are there easier alternatives?

In R, for example; I would use the cobs package for constrained splines. Does anything similar exists in Python?

Other ways of achieving the same result would also be fine if effective (e.g. fitting curves on monotonic transformations of the data that would maintain the overall shape of the relationship).

Thank you!

Eaglez
  • 61
  • 4

2 Answers2

5

Hi I do not know the specifics of your problem but you might find the following reference really interesting: Eilers, 2006 (especially paragraph 3). The idea presented in the reference is rather simple to implement (there should be also some matlab code in appendix). Anyway below you will find my own attempt :-)

A bit of context

The paper uses a smoother technique known as P-spline. P-splines have been introduced by Eilers and Marx, 1991 and combine B-splines (defined on equally spaced knows) and finite difference regularization of the spline coefficients (the second reference also contains some codes you can use to get accustomed to the methodology if you want).

In my answer I will use a special case of P-splines, the Whittaker graduation method (see e.g. Eilers, 2003 which also contains some computer code in appendix).

Force the smoother to be monotonic: asymmetric penalties

To achieve monotonicity, we want that the first differences of the estimated Whittaker smoother have same sign (all negative or positive). Suppose we want a monotonically increasing fit. Following Eilers, 2006 we can write out problem as $$ S = \|y - z\|^{2} + \lambda \| \Delta^{(3)} z\|^{2} + \kappa \sum v_{i} (\Delta^{(1)} z_{i})^{2} $$ where $z$ is the vector of unknowns, $\Delta^{(3)}$ is a third order difference operator, $\Delta^{(1)}$ is the first order difference operator between adjacent smoothed values, $v_{i}$ is a weighting factor with value 1 if $\Delta^{(1)} z_{i} > 0$ and 0 otherwise, $\lambda$ is a smoothing parameter and $\kappa$ is a large constant (let say equal to $10^6$).

Below you will find a small example comparing monotone and non monotone fit.

import math
import numpy 
import matplotlib.pyplot as plt

# Simulate data
N   = 100
xlo = 0.001
xhi = 2*numpy.pi
x   = numpy.arange(xlo, xhi, step = (xhi-xlo)/N)
y0  = numpy.sin(x) + numpy.log(x) 
y   = y0 + numpy.random.randn(N) * 0.5

# Prepare bases (Imat) and penalty
dd = 3
E  = numpy.eye(N)
D3 = numpy.diff(E, n = dd, axis=0)
D1 = numpy.diff(E, n = 1, axis=0)
la = 100
kp = 10000000

# Monotone smoothing
ws = numpy.zeros(N - 1)

for it in range(30):
    Ws      = numpy.diag(ws * kp)
    mon_cof = numpy.linalg.solve(E + la * D3.T @ D3 + D1.T @ Ws @ D1, y)
    ws_new  = (D1 @ mon_cof < 0.0) * 1
    dw      = numpy.sum(ws != ws_new)
    ws      = ws_new
    if(dw == 0): break  
    print(dw)

# Monotonic and non monotonic fits
z  = mon_cof
z2 = numpy.linalg.solve(E + la * D3.T @ D3, y)

# Plots
plt.scatter(x, y, linestyle = 'None', color = 'gray', s = 0.5, label = 'raw data')
plt.plot(x, z, color = 'red', label = 'monotonic smooth')
plt.plot(x, z2, color = 'blue', linestyle = '--', label = 'unconstrained smooth')
plt.legend(loc="lower right")
plt.show()

enter image description here

I hope this helps a bit (or at least you find it interesting).

Gi_F.
  • 1,011
  • 1
  • 7
  • 11
1

I don't know of python package that explicitly fits splines, but you should be able to achieve your goal with gradient boosting in the most recent version of scikit-learn (https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_23_0.html).

Specifically, you can fit a generalized additive model using HistGradienBoostingRegressor and setting max_depth=1, which ensures that there will be no interactions between features (if that's what you want). You can then use monotonic_cst to specify the monotonicity constraints for each feature.

This option also exists in XGBoost.

GregS
  • 11
  • 1