Odds of getting certain numbers within a longer sequence

Question

TL;DR: In a set of 35 randomly-selected numbers from 1-100, how likely is it that the same number will come up four times, at any point in the sequence (specific series not required)?

So, recently in the course of a roleplaying game, we had what seemed like a highly unlikely set of rolls, but I am not sure how likely or unlikely it actually was. I know how to work out the odds of a specific sequence of the same length as the number of tries (such as getting a specific four-digit number randomly), and I have found information on getting a specific consecutive sequence in a longer set of random selections, such as in this answer:

Coin toss: Probability of a run of certain length out of a longer sequence

But what we are curious about is this:

In a set of 35 randomly-selected numbers from 1-100, how likely is it that the same number will come up four times, at any point in those rolls? As in, we're not looking for consecutive rolls, just at all.

Apologies for the informal phrasing- I haven't studied probability in a formal setting, so I don't know the names for a lot of things.

Note that there are many posts on site that relate to the problems with defining the event of interest *post hoc* (i.e. with "hey, that's weird, what are the chances of that thing that happened?" type questions) — Glen_b, Jan 20 '20 at 09:37
See, for example, the discussion [here](https://stats.stackexchange.com/q/1424/805). Also see the links in whubers answer [here](https://stats.meta.stackexchange.com/a/3207/805). — Glen_b, Jan 20 '20 at 09:45
The answer to the question (the 'at least four' version) is approximately 4%, but for an even specified post-hoc it's not really a correct calculation. — Glen_b, Jan 20 '20 at 09:55
In light of the important cautionary comments by @Glen_b, all you need is a rough approximation--an order of magnitude would be fine. In fact, because the chance of each number is small and relatively few numbers are selected, you may treat this *multinomial* calculation as a set of 100 independent *binomial* calculations. Because the chance of observing $4$ or larger for a Binomial$(35,1/100)$ variable is $0.0004087,$ the value will be (extremely) close to $1 - (1 - 0.0004087)^{100} = 0.0401.$ (That's off by less than 3 in the last digit; the true value is $0.0403\ldots.$) — whuber, Jan 20 '20 at 14:57
BTW, a method to obtain an exact answer is described at https://stats.stackexchange.com/questions/1308. It is illustrated with a calculation where $100$ is replaced by $365$ and $35$ by $23,$ showing it can handle problems of this size. — whuber, Jan 20 '20 at 15:40
I do appreciate that the odds of a thing that has already happened happening are now 1:1, but I still found it to be interesting, thus the phrasing as if it were a prediction of future events. Thanks for the links, though; I will enjoy reading and exploring them. — Quinn, Jan 21 '20 at 11:14

score 2 · Accepted Answer · answered Jan 20 '20 at 10:59

2

Computing this in a strict manner using probabilities and combinatorics is quite hard, in particular because it entails some $n!$ with $n$ in the order of the hundreds, so not really handy calculations.

If you can settle for a Monte Carlo approximation of the probability over 1 million tries, it looks like it is around $0.04$, or $4\%$ like Glen said.

answered Jan 20 '20 at 10:59

Davide ND

2,305
8
24

1

For my purposes, a Monte Carlo approximation is entirely satisfactory, I just didn't know how to set up the math. Thanks! Combined with the comments above, I think this makes for a fairly useful outcome, so I am accepting this answer. – Quinn Jan 21 '20 at 11:15

Odds of getting certain numbers within a longer sequence

1 Answers1