Estimating total events from buckets hit

Question

I'm working on a project that will run $n=10000$ experiments. In this experiment, $j$ events will occur (an unknown number). Each of the events has a value $E_j$ attached to it. We expect these values to follow a normal distribution, although this is yet to be proven.

The events are measured using $k=100$ traps. These traps are reset at the start of each experiment. Each trap is configured to trigger if they observe an event value within their range $R_k$. All of the traps are configured to have the same size. The trap ranges are half-open and they are adjacent. This way any event $E$ where $\lfloor R_0 \rfloor \leq E \le \lceil R_{k-1}\rceil$, will trigger exactly one trap. At the end of the experiment, the state of trap traps is $X_j=1$ if there were any events $E$ where $\lfloor R_j\rfloor \leq E \le \lceil R_j\rceil$. We can only see if a trap was triggered, not how often it was triggered.

We will configure the traps such that a vast majority of the events will be in the range of a trap, we verify this by adding 'catch-all' traps at both ends of the measured range (which should rarely be triggered).

Ideally, we want to estimate $j_n$ per experiment, but we'll happily combine all experiments and estimate $\sum_{n=0}^{10000} j_n$ instead.

An example of what my measurements are like (lower bound for each trap is given, traps ar 5 wide):

Actual things to be measured (unknown in the real experiment):
n=0: 102, 103, 110, 125             (4 events)
n=1: 103, 106, 107, 108, 124, 124   (6 events)
n=2: 105, 117, 137, 138             (4 events)

The actual output from our experiment:
     100 105 110 115 120 125 130 135
n=0:   x   -   x   -   -   x   -   - 
n=1:   x   x   -   -   x   -   -   -
n=2:   -   x   -   x   -   -   -   x

In this example/simulation, 3 traps were triggered for each of the experiments, and we had $\sum_{n=0}^2 j_n=14$ total events.

This question was edited to be clearer using some formalised notation. Comments and answers may look a bit silly because of this.

Could you be a bit more specific what do you mean by *"traps for specific ranges"*? Also, what do you consider one *"set of measurements"*? — Jan Kukacka, Jun 06 '18 at 08:25
@JanKukacka Thanks for the feedback, I've improved the wording of my question, and added an example dataset. — Ondergetekende, Jun 06 '18 at 09:27
Is the problem that a triggered trap could represent more than one event? — whuber, Nov 09 '18 at 22:31

kjetil b halvorsen · Answer 1 · 2018-12-09T22:40:06.087

With the clarifications from the OP I try anew. There are $k=100$ traps, which can only detect if there is an event or not, not the number of events. The traps are adjacent and detect events in intervals $$ (-\infty, r_0],(r_0, r_0+\delta], (r_0+\delta,r_0+2\delta], \dotsc,(r_0+(k-2)\delta,r_0+(k-1)\delta], (r_0+(k-1)\delta,\infty) $$ and the counts in each of these intervals are the random variables (I will drop index on each of the 10000 experiments) $$ X_0^*, X_1^*, \dotsc, X_k^*. $$ But we do not observe the $(X_j^*)_{j=0}^k$, we observe only $X_j=\min(X_j^*,1)$. The probabilities associated with each of these intervals are given by the normal distribution assumption on the $E$ which is normal with mean $\mu$ and variance $\sigma^2$. So we find $$ p_0=\Phi(\frac{r_0-\mu}{\sigma}),p_1=\Phi(\frac{r_0+\delta-\mu}{\sigma})-p_0, \dotsc, p_i=\Phi(\frac{r_0+i\delta-\mu}{\sigma})-\Phi(\frac{r_0+(i-1)\delta-\mu}{\sigma}), \dotsc, 1-\Phi(\frac{r_0+(k-1)\delta-\mu}{\sigma}) $$ where $\Phi$ is the standard normal cumulative distribution.

The $(X_j^*)_{j=0}^k$ now have a multinomial distribution with point probabilities $$ \binom{j_n}{x_0^* x_1^* \dotsm x_k^*}p_0^{x_0^*} p_1^{x_1^*}\dotsm p_k^{x_k^*} $$ but this is not what we observe! The point probabilities for the observed variable $(X_j)_{j=0}^k$ will be sums over the above probabilities, and for large $k$ this will be a combinatorial nightmare. But assuming we can calculate or approximate those in some way, we get a likelihood function for the parameters $(j_n,\mu,\sigma)$ and can use numerical optimization. This is similar to (but more complex) than the problem of estimating an unknown binomial $N$ from Estimating parameters for a binomial.

I see two ways forward: Maybe an poisson approximation to the multinomial, see https://math.stackexchange.com/questions/2796263/poisson-limit-theorem-for-multinomial-distribution, https://www.sciencedirect.com/science/article/pii/0167715285900136, https://www.sciencedirect.com/science/article/pii/016771529190169R, https://www.jstor.org/stable/3314676?seq=1#metadata_info_tab_contents or maybe use of the EM algorithm.

I will look into those later, but now is bedtime ...

Original answer:

The problem statement is not entirely clear, bit I will first try to introduce some notation to make it clearer, then maybe OP could clarify.

So there are some "traps" that can be "triggered" by some "events" (or not). Let there be $k=100$ traps. Each is observed for some time (OP says 3 seconds) and either trigged or not by some event. The number of times it is trigged is $$ X^*_{tj},\quad j=1,\dotsc,k \quad t=1, \dotsc, T (=10000?) $$ But we do not observe $X^*_{tj}$, we observe a censored version $$ X_{tj}=\begin{cases} 1, &\text{if $X^*_{tj}\ge 1$} \\ 0, &\text{otherwise} \end{cases} $$ In addition, we observe the total number of event $N_t=\sum_{j=1}^k X^*_{tj}$.

The goal is to estimate/predict $N_t$ for each $t$. Since the number of timepoints where we observe this is large ($T\approx 10000$ if I have understood OP right) it should be possible to do something, especially if we can assume that the underlying distribution producing the events do not change with time.

I will come back and extend, but hopefully the OP could come back to clarify.

Nt (total number of events) is an unknown, unfortunately. This is ideally the number we're trying to estimate, although I'd settle for the sum of Nt over a given time period. — Ondergetekende, Nov 14 '18 at 13:02
Thanks, I will come back toonight to update the answer with the new info in edited question. But one question: What do you mean that the events are normally distributed? Even counts are integers, so some count distribution would look better. Or am I misunderstanding? Please clarify. — kjetil b halvorsen, Nov 14 '18 at 16:28

Estimating total events from buckets hit

1 Answers1