3

So here's the situation: There are n balls. Each ball has been voted on a scale of 1-N by an arbitrary number of people.

Every day (or every week, or more), some/most/all of the previous voters and some new voters are able to vote on the balls again. The thing is, I don't want all previous votes to all disappear because the votes that were valid yesterday are still valid today. The re-votes/new votes will just be weighted "more".

There can be more OR fewer people who vote during the new voting session.

What is a good method of data decay for this? I have thought of taking the square root of "yesterday"'s votes every day/week, but that doesn't seem like it would work well if I have lots of voters in one day because the square root of the sum of the votes doesn't decay that "much" and if I have few voters, the square root of the sum of the votes can decay less and so the new, initial votes, will skew the result set. I could be wrong; I'm not sure.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
Apothem
  • 65
  • 5
  • Perhaps some sort of weighted moving average? – Peter Flom Aug 18 '12 at 17:17
  • Well, from Wikipedia: "A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles." That seems to be what I want, yes. – Apothem Aug 18 '12 at 17:58
  • But that doesn't answer the question because what you need to specify is how many terms to included in the moving average. – Michael R. Chernick Aug 18 '12 at 18:16
  • I'm not sure what you mean by terms. Do you mean how many balls will be taken into consideration for the decay? If so, all N balls would be taken into consideration. – Apothem Aug 18 '12 at 18:50

1 Answers1

2

The EWMA (Exponentially Weighted Moving Average) is an alternative. Newer observations receive higher weigths than olders ones depending on the decay factor $\lambda \epsilon [0,1]$

$ \bar{X}_t = (1-\lambda) X_{t-1} + \lambda \bar{X}_{t-1}$

To start the recursion, an starting value $X_0$ is required, depending on each case, this can be a mean over an "initializing" period. From this formula it's not clear why this is an exponential weight but this can be shown mathematically by replacing $\bar{X}_{t-1}$ (and so on) by the very same formula until expressing $\bar{X}_{t}$ as function of {$X_0,X_1, ...,X_{t-1}$}.

You will find applications for volatility estimation of financial returns.

JDav
  • 751
  • 4
  • 8