0

I have a set of people living in different places and I calculated a probability of each of them coming to work in case of a flooding, depending on water level and where they live. Each of them works in a different team. The goal is to calculate, how many people each team expects to have depending on different water level.

The solution that comes in mind is making, say 1000 simulations per each team and see what is the resulting staff availability distribution and based on this distribution build my confidence intervals. But it is quite a lot of simulations, so I thought that there may be another, more "analytical" way of solving this task?

  • 1
    Hi there and welcome. To be honest, 1000 simulations per team is a pretty *small* number. A matter of milliseconds for a ("modern") computer. – Jim Jul 13 '18 at 09:05
  • Thanks, Jim. In fact, it is not about the compution time , it is more about that I do not want to invent something that already exists :-) I want to make sure that I am not performing the simulations for a "dice throwing" situation. – Grigory Sharkov Jul 13 '18 at 10:09

2 Answers2

0

I'm assuming that what you have here is the following model. Here I'll only handle the case of one single team.

Let $ X_i \sim \text{Bernoulli}(p_i) $, where $ X_i = 1 $ when person $ i $ shows up to work, and $ X_i = 1 $ with probability $ p_i $.

Then, let $ N $ be the size of the team, and let $ Y = \sum_{i=1}^N X_i $. Then, $ Y $ is the number of people on the team that show up in case of a flood.

What you're interested in now is the probability mass function of $ Y $, i.e. $ P(Y = k)\,, k = \{1, \cdots, N\} $.

If $ p_i = p $ for each individual on the team, then $ Y $ is simply a Binomial($N, p$) random variable. But it sounds like here that the $ p_i $'s are allowed to be different.

In this case, this is still computable (but naively intractable, not sure if there is an efficient algorithm to compute this).

Consider, for example $ P(Y = 1) $. This is the probability that one person shows up. We have that

\begin{equation} \begin{aligned} P(Y = 1) & = \sum_{i=1}^N P(\text{person $ i $ shows up and no one else does})\\ & = \sum_{i=1}^N p_i \prod_{j \in \{1, \cdots, N\}, j \neq i} (1 - p_j) \end{aligned} \end{equation}

and then follow this same logic for $ P(Y = k), k \in \{1, \cdots, N\} $.

For large $ N $ or small $ p_i $'s, you may want to do this computation on log-probabilities.

Kevin Li
  • 1,006
  • 5
  • 13
  • OK, thanks for this explanation. It seems that the sampling is the method that I should stick to. – Grigory Sharkov Jul 13 '18 at 16:18
  • Perhaps; although you can compute exact probabilities. It might take some more work to find a way to compute the pmf more efficiently. – Kevin Li Jul 13 '18 at 16:55
0

I have found a similar answer to the my question. Yes the MonteCarlo is the only way.