0

Lets suppose there is an event that gives a random outcome each time it happens. The set of possible events is finite, but their probabilities differ, sometimes by orders of magnitude. (imagine a heavily weighted dice)

How does one establish an upper bound for the probability of an event that is not in the set of possible outcomes? (eg. prove that with 99% probability a six sided dice will not give you a 7)

Usually when you sample a random event like this you can estimate the probability of a given outcome by $P_i = \frac{n_i}{N}$, where $P_i$ is the estimated probability of the $i$-th event, $n_i$ is the number of samples where the $i$-th event happened and $N$ is the total number of samples taken.

Obviously if the event is impossible, the estimated probability will always be zero. But as long as we have a finite number of samples, we cannot know for sure that the event is indeed impossible, or just have a really low non-zero probability. So the best we can do is to establish an upper bound, with a given confidence.

My question is: how to actually calculate that upper bound, based on the number of samples taken and the desired confidence? The solution is probably really simple, but so far the solution has eluded my attempts to find it via search. (but probably I was using the wrong search terms)

uLoop
  • 101
  • 2
  • 1
    In our chat room people have flagged a particularly entertaining quip, [" It is a well-known fact that an event A is very unlikely when P(A) < 0."](https://chat.stackexchange.com/transcript/message/50013391#50013391), that is applicable to the first half of this post. Only when one gets to the second half does your intent become apparent. You are looking for a *confidence limit for a Binomial probability.* – whuber May 09 '19 at 18:52
  • It is probably readily apparent that I am statistics-terminology-naiive. But wouldnt it be a multinomial probability instead of a binomial one? At a quick glance the binomial probability only allows 2 possible outcomes, whereas I have finitely many. Unless you mean that the 2 possibilities are: a) the impossible outcome and b) everything else. – uLoop May 09 '19 at 19:00
  • Can't you just _look_ at the die before you roll to check if any of the faces are marked 7? – Dilip Sarwate May 09 '19 at 19:28
  • My question is generalized. I did not want to make my question too specific to my real problem. (molecular collision simulations) – uLoop May 09 '19 at 19:34
  • OK, have you considered the _[Good-Turing estimation method](https://en.wikipedia.org/wiki/Good–Turing_frequency_estimation)_ ? – Dilip Sarwate May 09 '19 at 19:38
  • 1
    Either the event is observed or it is not: that gives a Binomial distribution. – whuber May 09 '19 at 19:46
  • You might like to look at this article https://www.ncbi.nlm.nih.gov/pubmed/6827763 which is rather entertainingly entitles "If nothing goes wrong, is everything all right?" and see if that helps you as it is not super clear what your problem is. – mdewey May 10 '19 at 14:41
  • The duplicates are the top hits in a [search for 'rule of three binomial'](https://stats.stackexchange.com/search?q=rule+of+three+binomial+score%3A2). – whuber May 10 '19 at 18:38

1 Answers1

0

I think I found what I was looking for, in the form of the rule of three.

It seems give exactly what I want, a confidence interval for an unseen event, given the desired confidence level and the number of total events. It is designed for binomial problems, but since my problem can be recast as a binomial one, it seems to be applicable.

PS: It only holds accurately for large total event counts, but since I tend to have a couple thousand total events, it should be a good approximation.

uLoop
  • 101
  • 2