Dynamic mean of randomly distributed events

Question

The aim is to estimate an error on a stochastic event rate dynamically.

This Question points in the same direction, I am interested in a theoretical extension.

I read out the event counter second-wise, every black $1$ is a counted event (new events over time, see the plot below).

During the measurement I am estimating the event rate, so as more statistics is accumulated, mean event rate (red) should asymptotically become more accurate.

dyn_mean1

As one can see, the mean value oscillates around true value of 0.5,

dyn_mean2

even after one order of magnitude more events collected.

Practical question: How can one calculate the number of events needed to estimate the mean value to a maximum error ($0.5\pm \sigma$)? - answered (?) here

Theoretical question: Can this oscillation be described analytically? Can you suggest further reading?

The events are radiation counts, so they are uncorrelated, may by Poisson-distribution applied?

Addendum: Idealized first approximation - every 10th event is non-zero: dyn_mean_reg1

May be this curve is superimposed with the realer-life example above, is any techniques of partitioning in arbitrary functions applicable here?

See my comment to the referenced question. For the theoretical question, read about Brownian motion and random walks. — whuber, Aug 19 '11 at 22:53

j.p. · Accepted Answer · 2011-08-22T15:10:23.567

This is an answer to your first "theoretical" question Can this oscillation be described analytically? Can you suggest further reading?

If I understand your question correctly, you are in the situation of independent identically distributed random variables with finite variance $\sigma^2$. In this case the law of the iterated logarithm applies which tells you that you should expect the difference between estimated mean and mean to oscillate between $-\sigma$ and $+\sigma$, if you normalize it by dividing by $\sqrt{2n\log{\log{n}}}$.

The linked wikipedia pages are good for a first reading. Wolfram offers a demo (which didn't work in my browser).

score 1 · Answer 2 · answered Aug 20 '11 at 16:57

If you are estimating an unknown probability $p$, then, assuming i.i.d. nature of your events, the count $X$ is Binomial, with variance $np(1-p)$, where $n$ is the total number of events. The estimated proportion is $\hat p = X/n$ with variance $p(1-p)/n\le 0.5 \cdot 0.5 / n = 0.25/n$ for any $p$. For a given margin of error $m$, you can say that, approximately, $\mbox{Prob}[ |\hat p-p|>m ] \le \mbox{Prob}[ |Z|>m/(0.5/\sqrt{n}) ]$ where $Z$ is the standard normal variate. (You can do exact calculations with pbinom if you really need to.) For a 5% tail probability, we have $m/(0.5/\sqrt{n})=1.96$; $n=(1.96/2m)^2$, so for 3% accuracy, we have $n=(1.96/(2\cdot0.03))^2 = 1067$. This gives you an answer to an unrelated question: Why are polls always conducted for the sample size of 1000 or so? (Answer: to get this 3% accuracy with 95% confidence.)

You can of course approach this from the point of view of Bayesian updating with say a Jeffrey's prior $\propto [p(1-p)]^{-1/2}$ (i.e., $B(\frac12, \frac12)$ distribution), and getting your data in until variance of the posterior (also a Beta distribution) is small enough. The above derivation, essentially, assumes a flat prior, $B(1,1)$.

Dynamic mean of randomly distributed events

2 Answers2

Linked