0

So I can use "EWMA" (1) to update an estimate of the mean as each new measurement is received.

If I know the window size of the smooth($\eta$), the previous estimate($ \bar{x}_t$), and the new measurement ($x_{t+1} $) then I can compute the updated mean with only one new measurement.

$ \bar {x}_{t+1} = \eta \cdot \bar{x}_{t}+(1-\eta)\cdot x_{t+1}$

I have to store the running mean, and know what my window parameter is, but it is fairly simple.

I want a reduced parameter running median.

It doesn't have to be exact. (Yes, please do not mock me for such a statement - I know it has the potential to open up all sorts of crazy, but it doesn't have to.)

I have ~10k rows. My typical window size for an informative mean is about 500 elements. I don't want to do a sort of a window for 10,000 times. My extended use case has hundreds of millions of rows - so that kind of an approach might not scale well. Is there any way that I can wrap up tail behavior in a statistic, and keep only a subset of the values?

If I had an ecdf, then I could keep unique values. On average i would only have to update part of the stored values.

If I "disimproved" the granularity of the measurements then I could move toward fixed bin locations, and operate primarily in bin memberships.

If I did piecewise linear interpolation on the CDF then I could update those parameters over time, and that could reduce the overhead, right? I would have to make sure that I had enough segments, and that things stayed well behaved, numerically speaking.

Questions:

  • Are there "textbook" approaches for windowed moving medians? What are they?
  • Is there a variation on the Theil-Sen estimator that could be useful for this?

Please provide thoughts or references.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82
  • 1
    The first question is answered at http://stats.stackexchange.com/questions/7959. It's unclear what the second question has to do with windowed medians, because the Theil-Sen estimator is used for *slopes* and you haven't indicated you are in a regression setting. – whuber Sep 29 '14 at 15:12
  • I'm trying to parametrize the moving median. TS might be a way to do that. – EngrStudent Sep 29 '14 at 16:05
  • 1
    I'm not grasping how it could help, although maybe I'm overlooking something obvious. What makes me shudder is that because the T-S estimator looks at all pairs of data (in order to find a median *slope*, not a median value) that kind of approach is going to be hideously more inefficient than even a brute force moving window median. – whuber Sep 29 '14 at 17:01

0 Answers0