So I can use "EWMA" (1) to update an estimate of the mean as each new measurement is received.
If I know the window size of the smooth($\eta$), the previous estimate($ \bar{x}_t$), and the new measurement ($x_{t+1} $) then I can compute the updated mean with only one new measurement.
$ \bar {x}_{t+1} = \eta \cdot \bar{x}_{t}+(1-\eta)\cdot x_{t+1}$
I have to store the running mean, and know what my window parameter is, but it is fairly simple.
I want a reduced parameter running median.
It doesn't have to be exact. (Yes, please do not mock me for such a statement - I know it has the potential to open up all sorts of crazy, but it doesn't have to.)
I have ~10k rows. My typical window size for an informative mean is about 500 elements. I don't want to do a sort of a window for 10,000 times. My extended use case has hundreds of millions of rows - so that kind of an approach might not scale well. Is there any way that I can wrap up tail behavior in a statistic, and keep only a subset of the values?
If I had an ecdf, then I could keep unique values. On average i would only have to update part of the stored values.
If I "disimproved" the granularity of the measurements then I could move toward fixed bin locations, and operate primarily in bin memberships.
If I did piecewise linear interpolation on the CDF then I could update those parameters over time, and that could reduce the overhead, right? I would have to make sure that I had enough segments, and that things stayed well behaved, numerically speaking.
Questions:
- Are there "textbook" approaches for windowed moving medians? What are they?
- Is there a variation on the Theil-Sen estimator that could be useful for this?
Please provide thoughts or references.