1

I'm testing a Range function from big integer software libraries. The function will return an integer in the range [0,k), where k is the upper limit on the range. I'm seeing unexpected results, and I hope the CV folks can help point out my mistake.

Here's the pseudo code:

bins[65536];
for i = 0 to 1000000
{
    v = Range(0, 65536)
    bins[v] = bins[v] + 1
}

sort(bins)
low = bins[0]
high = bins[65535]

In the above, bins is just a count of the frequency. And sort is a sort of the bin based on the the frequency.

I expect to see each value (0 through 65535) about 15 times because 1000000 / 65536 ~= 15.

However, repeated runs of the program continually produce results similar to:

Low (`bins[0]`): 1 or 2
High (`bins[65535]`): 34 or 35

low is never above 2, and high is never less than 34 or so. I'm beginning to think this generator might have a bias.

Am I doing something wrong in my test? Or am I interpreting the results incorrectly?

(And sorry for yet another "Is this a uniform distribution" question).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 3
    There *is* a bias: you're not getting enough values greater than $35$! You ought to read some of our threads on testing pseudo RNGs, such as http://stats.stackexchange.com/questions/30 and http://stats.stackexchange.com/questions/92984. – whuber Sep 24 '14 at 21:04
  • Perfect, thank you @whuber. Its been nearly 20 years since I had a stat class in college. I can't begin to tell you how much I have forgotten... –  Sep 24 '14 at 21:22
  • For future visitors, this also seems relevant: [Testing randomly generated data against its intended distribution](http://stats.stackexchange.com/questions/27958/testing-randomly-generated-data-against-its-intended-distribution). –  Sep 24 '14 at 22:26

0 Answers0