Looking for function to fit sigmoid-like curve

Question

I'm looking for a function to fit sigmoid-like curves, from experimental data points.

The model (the function) doesn't matter, it doesn't have to be physically relevant, I just want to be able to calculate y from any x. And I don't want to extrapolate between two points.

Here is an example:

And here is the corresponding raw data:

| X             | Y              |
|---------------|----------------|
| 0             | 0              |
| 1,6366666667  | -12,2012787905 |
| 3,2733333333  | -13,7833876716 |
| 4,91          | -10,5943208589 |
| 6,5466666667  | -1,3584575518  |
| 8,1833333333  | 8,1590423167   |
| 9,82          | 13,8827937482  |
| 10,4746666667 | 18,4965880076  |
| 11,4566666667 | 42,1205206106  |
| 11,784        | 45,0528073182  |
| 12,4386666667 | 76,8150755186  |
| 13,0933333333 | 80,0883540997  |
| 14,73         | 89,7784173678  |
| 16,3666666667 | 98,8113459392  |
| 19,64         | 104,104366506  |
| 22,9133333333 | 105,9929585305 |
| 26,1866666667 | 94,0070414695  |

Do you have an idea ? My problem is that the data goes below 0 for some points.

EDIT:

Some of you are bothered by the last point. To clarify: at the end of the curve, there should be a plateau. The last point is just a bit buggy. I will probably remove it from the data when I'll start fitting.

If you don't care about the function, why do you think you need it? Serious question. — Nick Cox, Apr 07 '16 at 15:21
Is your fit supposed to go down initially (as the data seem to suggest)? — Glen_b, Nov 30 '17 at 02:44
If you want the fitted curve to be parametric and a 4 parameter logistic curve, take a look at the drc package: https://cran.r-project.org/web/packages/drc/index.html — W7GVR, Nov 05 '19 at 21:00

score 6 · Accepted Answer · edited Nov 29 '17 at 19:43

6

I think smoothing splines with small degrees of freedom would do the trick. Here's an example in R:

The R code:

txt <- "| 0             | 0              |
| 1.6366666667  | -12.2012787905 |
| 3.2733333333  | -13.7833876716 |
| 4.91          | -10.5943208589 |
| 6.5466666667  | -1.3584575518  |
| 8.1833333333  | 8.1590423167   |
| 9.82          | 13.8827937482  |
| 10.4746666667 | 18.4965880076  |
| 11.4566666667 | 42.1205206106  |
| 11.784        | 45.0528073182  |
| 12.4386666667 | 76.8150755186  |
| 13.0933333333 | 80.0883540997  |
| 14.73         | 89.7784173678  |
| 16.3666666667 | 98.8113459392  |
| 19.64         | 104.104366506  |
| 22.9133333333 | 105.9929585305 |
| 26.1866666667 | 94.0070414695  |"

dat <- read.table(text=txt, sep="|")[,2:3]
names(dat) <- c("x", "y")
plot(dat$y~dat$x, pch = 19, xlab = "x", ylab = "y", main = "Smoothing Splines with Varying df")

spl3 <- smooth.spline(x = dat$x, y = dat$y, df = 3)
lines(spl3, col = 2)

spl8 <- smooth.spline(x = dat$x, y = dat$y, df = 8)
lines(spl8, col = 4)

legend("topleft", c("df = 3", "df = 8"), col = c(2,4), bty = "n", lty = 1)

edited Nov 29 '17 at 19:43

kjetil b halvorsen

63,378
26
142
467

answered Apr 07 '16 at 15:43

jld

18,405
2
52
65

Now it starts to look like a sinusoid.... – Nick Cox Apr 07 '16 at 15:45
It looks promising. See my edit for a clarification. After you fitted the data, are you able to calculate x for any given y ? – Rififi Apr 07 '16 at 15:52
1

For any $x$ (including ones not observed) you can indeed calculate the corresponding $y$. – jld Apr 07 '16 at 20:11
1

What about a monotone spline? – kjetil b halvorsen Nov 29 '17 at 17:20
1

@kjetilbhalvorsen ah that’s a really good idea, that would fix the main problem with this – jld Nov 29 '17 at 17:30

kjetil b halvorsen · Answer 2 · 2019-05-24T11:46:48.823

To fit a sigmoid-like function in a nonparametric way, we could use a monotone spline. This is implemented in the R package (all R packages here referenced are on CRAN) splines2. I will borrow some R code from the answer by @Chaconne, and modify it for my needs.

splines2 offers the functions mSpline, implementing M-splines, which is a everywhere nonnegative (on the interval where defined) spline basis, and iSpline, the integral of the M-spline basis. The last one then are monotone increasing, so we can fit an increasing function by using them as a regression spline basis, and fit a linear model with restrictions on the coefficients to be non-negative. The last is implemented in a user-friendly way by R package colf, "constrained optimization on linear functions". The fits look like:

The R code used:

library(splines2) # includes monotone splines,  M-splines,  I-splines.
library(colf) # constrained optimization on linear functions

 txt <- "| 0             | 0              |
    | 1.6366666667  | -12.2012787905 |
    | 3.2733333333  | -13.7833876716 |
    | 4.91          | -10.5943208589 |
    | 6.5466666667  | -1.3584575518  |
    | 8.1833333333  | 8.1590423167   |
    | 9.82          | 13.8827937482  |
    | 10.4746666667 | 18.4965880076  |
    | 11.4566666667 | 42.1205206106  |
    | 11.784        | 45.0528073182  |
    | 12.4386666667 | 76.8150755186  |
    | 13.0933333333 | 80.0883540997  |
    | 14.73         | 89.7784173678  |
    | 16.3666666667 | 98.8113459392  |
    | 19.64         | 104.104366506  |
    | 22.9133333333 | 105.9929585305 |
    | 26.1866666667 | 94.0070414695  |"

    dat <- read.table(text=txt, sep="|")[,2:3]
names(dat) <- c("x", "y")
plot(dat$y ~ dat$x, pch = 19, xlab = "x", ylab = "y", main = "Monotone Splines with Varying df")

Imod_df_4  <-  colf_nls(y ~ 1 + iSpline(x, df=4), data=dat, lower=c(-Inf, rep(0, 4)), control=nls.control(maxiter=1000, tol=1e-09, minFactor=1/2048) )
lines(dat$x, fitted(Imod_df_4), col="blue")

Imod_df_6  <-  colf_nls(y ~ 1 + iSpline(x, df=6), data=dat, lower=c(-Inf, rep(0, 6)), control=nls.control(maxiter=1000, tol=1e-09, minFactor=1/2048) )
lines(dat$x, fitted(Imod_df_6), col="orange")

Imod_df_8  <-  colf_nls(y ~ 1 + iSpline(x, df=8), data=dat, lower=c(-Inf, rep(0, 8)), control=nls.control(maxiter=1000, tol=1e-09, minFactor=1/2048) )
lines(dat$x, fitted(Imod_df_8), col="red") 

EDIT

Monotone restrictions on a spline is a special case of shape-restricted splines, and now there is one (in fact several) R packages implementing those simplifying their use. I will do the above example again, with one of those packages. The R code is below, using the data as read in above:

library(cgam)

mod_cgam0 <- cgam(y ~ 1+s.incr(x), data=dat, family=gaussian)
summary(mod_cgam0)
Call:
cgam(formula = y ~ 1 + s.incr(x), family = gaussian, data = dat)

Coefficients:
            Estimate  StdErr t.value   p.value    
(Intercept)  43.4925  2.7748  15.674 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 102.2557)

Null deviance:  33749.25  on 16  degrees of freedom 
Residual deviance:  1636.091  on 12.5  observed degrees of freedom 

Approximate significance of smooth terms: 
          edf mixture.of.Beta   p.value    
s.incr(x)   3          0.9515 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
CIC:  7.6873

This way the knots (and degrees of freedom) has been selected automatically. To fix the number of degrees of freedom use:

mod_cgam1 <- cgam(y ~ 1+s.incr(x, numknots=5), data=dat, family=gaussian)

A paper presenting cgam is here (arxiv).

Kelvin · Answer 3 · 2016-04-07T16:18:05.567

The curve you show looks more like a cubic function, $ax^3+bx^2+cx+d$, as the ends turn up and down, rather than extend flat/horizontal. Or something like this, made with a polynomial trend line in Excel:

But otherwise, if you want the ends to extend horizontal, there are many sigmoidal CDF probability distributions to choose from. The questions you need to ask yourself in choosing the most appropriate distribution are:

What is the underlying mechanism/rationale for a sigmoidal-shaped curve?
How flexible in shape does it need to be? How many degrees of freedom? This will depend on how many data points, as you want to avoid overfitting. But also, what features vary and what features stay constant? The mean? Variance (spread)? Skewness (lop-sidedness)? Kurtosis (tails)?
Then you can search for the right shape from this list in wikipedia (https://en.wikipedia.org/wiki/List_of_probability_distributions), or refine your question with more details to get the best answer.
There are also 4- and 5-parameter distributions based on the logit function with much more flexibility in shape, but again, you should avoid unless you have a lot of data points.

And PS. You should never selectively add or remove data points for fitting - BAD BOY!

I know what happens at the extremes. See my edit. for x < 0, the function is not defined. — Rififi, Apr 07 '16 at 15:54

score 3 · Answer 4 · answered Apr 07 '16 at 15:58

3

You can use the sigmoid() function from the {pracma} package in R. The function will fit a sigmoidal curve to a numeric vector. If you don't care what function fits the data, I would recommend the gam() function from the {mgcv} package in R. It fits a smoothing function to the data using spline regression (the default is thin-plate, but you can check the documentation for other types). Using gam(),as with any non-parametric model fit, you won't be able to predict y from x with any reliability outside of the range of x in your data set as the predictions will simply follow the direction of the "last slope" of the curve, but from your question it sounds like you are not concerned with that. Hope this helps!

answered Apr 07 '16 at 15:58

A. Vezey

127
9

Exactly, I'm only intersted in x in the range of the experiment. Any chance you would know an equivalent of gam() in python ? – Rififi Apr 07 '16 at 16:13
Unfortunately, I do not. We use R exclusively where I work. Sorry, mate! – A. Vezey Apr 07 '16 at 16:22
It would be interesting to see if mgcv could be used with monotone splines, as in my answer. Some code is here: https://gist.github.com/jnpaulson/c47f9bd3246f1121ad3a911e5f707355 (I did not test it, yet) – kjetil b halvorsen Nov 30 '17 at 10:50

score 1 · Answer 5 · edited Jan 09 '17 at 12:09

1

Along with the other suggestions, a Gompertz growth curve would also fit this data. Here's what Wikipedia has to say about it:

A Gompertz curve or Gompertz function, named after Benjamin Gompertz, is a sigmoid function. It is a type of mathematical model for a time series, where growth is slowest at the start and end of a time period. (https://en.wikipedia.org/wiki/Gompertz_function)

The key is the sigmoid function. Here's the formula for $y$ as a function of $t$:

$$y(t)=a \exp[-b\exp(-ct)],$$

where $a$ is an asymptote, since $\lim_{t \to \infty} a \exp[-b \exp(-ct)]= a \exp(0)=a$;

$b>0$ sets the displacement along the $x$-axis (translates the graph to the left or right)

$c > 0$ sets the growth rate ($y$ scaling).

Here $\exp(1) = e$ is Euler's Number $2.71828...$.

edited Jan 09 '17 at 12:09

Glen_b

257,508
32
553
939

answered Apr 07 '16 at 16:01

Mike Hunter

9,682
2
20
43

Can f(x) be < 0 ? – Rififi Apr 07 '16 at 16:07
Yes! See the Wiki page for confirmation... – Mike Hunter Apr 07 '16 at 16:14
Correction: can y go below 0, then go back to positive values ? As shown in the picture of the question. From the wiki, it seems there is a plateau on the left side of the curve, and one on the right side. – Rififi Apr 07 '16 at 16:17
@DJohnson I've hacked at the mathematical typesetting. Do roll back the edit if you don't like it. – Nick Cox Apr 07 '16 at 17:31
This curve is monotonic and has no turning points. Nor will it ever cross zero. $a$ gives the sign to $y$: $a$ negative implies that $y$ is always negative and declines to a negative asymptote. $a$ positive implies the opposite: $y$ is always positive and increases to a positive asymptote. Your choice, but it doesn't look flexible enough for your needs. – Nick Cox Apr 07 '16 at 17:35
@nickcox It is a tautological function...and thanks for cleaning up the post... – Mike Hunter Apr 07 '16 at 17:41
Tautological: what does that mean here? Do you mean autocatalytic? – Nick Cox Apr 07 '16 at 17:45
@NickCox I've always used *tautological* to mean *deterministic* and/or "lock-step" results but, prompted by your query, I see that it doesn't necessarily mean that, hence the confusion. I definitely do not intend it to mean *needless repetition*. – Mike Hunter Apr 07 '16 at 17:50
I agree! I thought "misled" was the past participle of "misle" for some while in my youth, which was complete nonsense, so these little verbal mistakes can stick. – Nick Cox Apr 07 '16 at 17:55
@NickCox How is that nonsense? *Misled* **is** the past participle of mislead. Based on the Urban Dictionary's definition of *misle*, they would all be cognates (see http://www.urbandictionary.com/define.php?term=misle). Of course, the UD is alone in this regard as all the other online dictionaries define *misle* as a fine mist or rain. Chalk it up to the vicissitudes of yout, I guess... – Mike Hunter Apr 07 '16 at 19:55
1

Quite, but I (and somehow my brother too) conjured up a non-existent _misle_ (as stated above). – Nick Cox Apr 07 '16 at 19:59

score 0 · Answer 6 · answered Nov 29 '17 at 16:24

But if you don't want to extrapolate (I think interpolate would be a better word) between the points, then no parametric function will give you better than what you have got, that is a straight line between your points. If you have a parametric model with as many parameters as you have observations, it will simply replicate what you have already.

score 0 · Answer 7 · answered Nov 29 '17 at 17:22

If you're trying to obtain a CDF-like function (non-zero), you could use weighted Weibull curve of the form

$y=A(1-e^{-(x/\alpha)^\beta})$

When I do this, I obtain roughly $A = 100$, $\alpha=12.3$, and $\beta=9.0$

(the resultant $\beta$ is much higher than I've typically run across for lifetime distributions)

Looking for function to fit sigmoid-like curve

7 Answers7

Linked