2

I have a number of curves that contain numbers from between 0 and 1. The curves should be monotonically increasing, but due to random noise, there may be some times where it is decreasing.

Is there any smoothing method that is guaranteed to create a monotonically increasing curve? If there is a relevant Python package that would be helpful.

Two more points about the data that may be useful:

  1. Certain data points have weights, so if there is a useful way of incorporating those weights into the smoothing, that would be useful.
  2. We can be confident that the end points of the curve are accurate.
user35734
  • 376
  • 3
  • 10
  • 1
    One option: https://en.wikipedia.org/wiki/Monotone_cubic_interpolation – Sycorax Sep 22 '19 at 22:00
  • @Sycorax Doesn't this assume the data is already monotonic? – user35734 Sep 22 '19 at 22:38
  • I suppose that specific wikipedia article does. In general, there are alternatives, such as imposing shape restrictions on cubic splines. https://projecteuclid.org/euclid.aoas/1223908050 – Sycorax Sep 22 '19 at 22:42
  • 2
    There are certainly monotonic cubic spline fits -- and R packages that can fit them. I don't know about Python – Glen_b Sep 23 '19 at 05:33
  • Could you clarify the distinction you appear to make between "curves" and "data points" and explain what it means for a curve to "contain numbers"? What do you really mean by "curve"? What do these weights actually measure? – whuber Sep 24 '19 at 12:58
  • 1
    @whuber I have a list of data points, with x and y coordinates for each data point. When I use the term curve, I'm referring to this list of data points. A curve is monotonically decreasing if for all points in the curve, if x_i > x_j, then y_i < y_j. – user35734 Sep 25 '19 at 14:45
  • Thank you: that's clear. But please explain the meaning of the weights and tell us whether the "random noise" affects the $x_i,$ the $y_i,$ or both, because the proper smoothing method depends on those details. – whuber Sep 25 '19 at 15:51
  • @whuber The x_i are defined manually, so there is no error there. The y_i contains random noise. To give more background - we have two columns a and b. Column a corresponds to the coordinates x_i, column b consists of binary 0 or 1 variables. In order to construct the coordinates, (x_i, y_i), we group our dataframe on column a, with aggregation function mean on column b. So the averaged values are the y_i while the grouped column is the x_i. There is a true distribution of averages that correspond with the x_i, and the corresponding curve should be increasing. – user35734 Sep 25 '19 at 16:19
  • But due to lack of data, the empirical distribution is sometimes jagged and not monotonic. When I refer to weights, I am referring to the count of the number of rows that went into the grouping for a particular x_i, in essence the higher the count, the more confident we should be that the empirical average is close to the true average. – user35734 Sep 25 '19 at 16:21
  • 1
    These details profoundly influence the statistical nature of your question and suggest solutions that nobody would propose based on what you have posted so far (such as logistic regression with a constrained slope): could you please edit your post to include them? – whuber Sep 25 '19 at 17:44
  • See for an example: https://stats.stackexchange.com/questions/206073/looking-for-function-to-fit-sigmoid-like-curve/316446#316446 – kjetil b halvorsen Jan 20 '21 at 14:41

0 Answers0