Is this scaling algorithm viable?

Question

I just came up with my own random number scaling algorithm (and I'm sure someone else has come up with it before me), and I wanted to see if any of you can find holes in it.

The idea is to take a string of random binary data (such as 01011101100010011001011110100111001) from a truly random source (http://qrng.anu.edu.au/ in this case), and use that to create a string of random numbers between 0 and 71.

The simple way is to take 7-bit chunks, convert them to decimal, and throw out anything 72 and over. But since random bits aren't free to produce, I want to be more responsible and throw out as little data as possible.

The Algorithm

Ok, I don't even know if algorithm is the right word. I think of it as a crawler. Since any binary, 7-digit number starting with 11, 101 or 1001 will be 72 or higher, I crawl through the bits one at a time and if I see that pattern at the beginning of the number, I throw away what I've gotten so far, and continue crawling. So to crawl 01011101100010011001011110100111001 it looks like this:

0-1-0-1-1-1-0 = 46, Use that number!

1-1 STOP! Throw out those 2 bits and continue crawling

0-0-0-1-0-0-1 = 9, use it!

1-0-0-1 STOP! Throw out all 4 and continue

0-1-1-1-1-0-1 = 61

0-0-1-1-1-0-0 = 28

The last bit I have to throw out (or save for later). I have found this gives me one usable number per 8.7 bits on average, instead of the 11.5 bits when I use the plain 7-bit chunk method.

Can anyone find any holes in my reasoning that would make this method less random (i.e. more predictable)?

This issue is addressed (in great detail) at https://stats.stackexchange.com/questions/406723. Your algorithm can be expressed (and helpfully visualized) as traversing a binary tree using rejection sampling. — whuber, Jun 27 '19 at 21:11
Thanks for the link! Unfortunately the math there is over my head and too confusing for me to follow (even though math is my strong suit, I never took any statistics or calculus, so the symbols and terms are Greek to me). But I will start googling using the terms you used (traversing a binary tree using rejection sampling) to help me find out if this approach is at all flawed. — Stevish, Jun 28 '19 at 12:17

Peter O. · Answer 1 · 2020-11-17T04:58:16.527

Your goal is ultimately to roll a k-sided die given only a p-sided die, without wasting randomness.

In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p".

In your case, p is 2 (since you're gathering binary or base-2 data) and k is 72 (since there are 72 faces of the die). However, 72 has prime factors other than 2 (only powers of 2, namely 1, 2, 4, 8, 16, 32, 64, 128, etc., have no prime factors other than 2). For example, 72 is divisible by the prime number 3, which does not divide 2. So the best you can do is use rejection sampling to get arbitrarily close to no waste of randomness, such as by batching multiple rolls of the p-sided die until pⁿ is "close enough" to a power of k.

See also the Math Forum as well as the Stack Overflow question: https://stackoverflow.com/questions/6046918/how-to-generate-a-random-integer-in-the-range-0-n-from-a-stream-of-random-bits, especially my answer there.

Is this scaling algorithm viable?

1 Answers1