35

Suppose you observe the sequence:

7, 9, 0, 5, 5, 5, 4, 8, 0, 6, 9, 5, 3, 8, 7, 8, 5, 4, 0, 0, 6, 6, 4, 5 , 3, 3, 7, 5, 9, 8, 1, 8, 6, 2, 8, 4, 6, 4, 1, 9, 9, 0, 5, 2, 2, 0, 4, 5, 2, 8 ...

What statistically tests would you apply to determine if this is truly random? FYI these are the $n$th digits of $\pi$. Thus, are digits of $\pi$ statistically random? Does this say anything about the constant $\pi$?

enter image description here

Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75
  • 15
    --> http://www.jstor.org/discover/10.2307/2685604?uid=3737592&uid=2129&uid=2&uid=70&uid=4&sid=21101484916817 – ocram Nov 26 '12 at 17:24
  • 1
    From the paper linked: "We thus fail to find convincing evidence against the null hypothesis that the digits of $\pi$ are adequately modeled as an iid sequence. " – Cam.Davidson.Pilon Nov 26 '12 at 17:29
  • 3
    Another one: [Refutation of claims such as ``Pi is less random than we thought''](http://interstat.statjournals.net/YEAR/2006/articles/0601001.pdf) –  Nov 26 '12 at 17:35
  • 10
    This is an interesting and maddening question. Any student that has taken a first course in measure-theoretic probability can easily prove that "almost all" real numbers are *normal*. But very few explicit examples are known, and to my (off-hand) knowledge, the matter has not been settled either way for any of the "famous" irrational mathematical constants. – cardinal Nov 26 '12 at 17:45
  • 4
    In (strict) connection with @cardinal's comment: [Normal number](http://en.wikipedia.org/wiki/Normal_number) –  Nov 26 '12 at 17:52
  • People - you made great comments - which can just as well be good answers :) – Tal Galili Nov 26 '12 at 19:21
  • @Cam.Davidson.Pilon Regarding your first comment, recall that "[absence of evidence is not evidence of absence](http://en.wikipedia.org/wiki/Argument_from_ignorance)". –  Nov 26 '12 at 19:35
  • Bogdanoff brothers claimed on a French TV channel that someone has proved that the digits of $\pi$ are not random, and $\pi$ existed before the Big-Bang therefore Universe has not been created at random. A reliable scientific source !! ;-) – Stéphane Laurent Nov 26 '12 at 20:06
  • 2
    I asked a similar question on Mathematics, which you might find interesting: http://math.stackexchange.com/questions/51829/distribution-of-the-digits-of-pi – Mathias Nov 28 '12 at 03:59
  • 7
    What's the graph? There are ten bars, oddly spaced, and all with values above 10%! – xan Nov 29 '12 at 22:53
  • Oops, I think I converted it to a density function (using plt.histo(normed=True) ), instead of a proper mass function. – Cam.Davidson.Pilon Nov 30 '12 at 20:53
  • If one examines $\pi$ as the solution to an infinite series expansion, it seems obvious that they are random in binary. In base ten, no additional information is added. – Carl Mar 10 '18 at 12:15
  • The point is "statistically random" is ill defined. If distribution of digit values was the sole determinant, then 123456789/9999999999 is a perfectly random number even though it is a completely repeating decimal. – AdamO Sep 09 '19 at 16:44
  • Its been 7 years since I asked this and Cameron has learned a lot. In retrospect I wish I was more careful with my choice of words. I hope I know better now! – Cam.Davidson.Pilon Sep 09 '19 at 21:23

3 Answers3

16

The US National Institute of Standard has put together a battery of tests that a (pseudo-)random number generator must pass to be considered adequate, see http://csrc.nist.gov/groups/ST/toolkit/rng/stats_tests.html. There are also tests known as the Diehard suite of tests, which overlap somewhat with NIST tests. Developers of Stata statistical package report their Diehard results as a part of their certification process. I imagine you can take blocks of digits of $\pi$, say in groups of consecutive 15 digits, to be comparable to the double type accuracy, and run these batteries of tests on thus obtained numbers.

StasK
  • 29,235
  • 2
  • 80
  • 165
6

Answering just the first of your questions: "What tests would you apply to determine if this [sequence] is truly random?"

How about treating it as a time-series, and checking for auto-correlations? Here is some R code. First some test data (first 1000 digits):

digits_string="1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989"
digits=as.numeric(unlist(strsplit(digits_string,"")))

Check the counts of each digit:

> table(digits)
digits
  0   1   2   3   4   5   6   7   8   9 
 93 116 103 102  93  97  94  95 101 106 

Then turn it into a time-series, and run the Box-Pierce test:

d=as.ts( digits )
Box.test(d)

which tells me:

X-squared = 1.2449, df = 1, p-value = 0.2645

Typically you'd want the p-value to be under 0.05 to say there are auto-correlations.

Run acf(d) to see the auto-correlations. I've not included an image here as it is a dull chart, though it is curious that the biggest lags are at 11 and 22. Run acf(d,lag.max=40) to show that there is no peak at lag=33, and that it was just coincidence!


P.S. We could compare how well those 1000 digits of pi did, by doing the same tests on real random numbers.

probs=sapply(1:100,function(n){
    digits=floor(runif(1000)*10)
    bt=Box.test(ts(digits))
    bt$p.value
    })

This generates 1000 random digits, does the test, and repeats this 100 times.

> summary(probs)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.006725 0.226800 0.469300 0.467100 0.709900 0.969900 
> sd(probs)
[1] 0.2904346

So our result was comfortably within the first standard deviation, and pi quacks like a random duck. (I used set.seed(1) if you want to reproduce those exact numbers.)

Darren Cook
  • 1,772
  • 1
  • 12
  • 26
2

It's a strange question. Numbers aren't random.

As a time series of base 10 digits, $\pi$ is completely fixed.

If you are talking about randomly selecting an index for the time series, and picking that number, sure it's random. But so is the boring, rational number $0.1212121212\ldots$. In both cases, the "randomness" comes from picking things at random, like drawing names from a hat.

If what you're talking about is more nuanced, as in "If I sequentially reveal a possibly random sequence of numbers, could you tell me if it's a fixed subset from $\pi$? And where did it come from?". Well first, though $\pi$ is not repeating, different random sequences will at least locally align for a small run. That's a number theory result, not a statistical one. As soon as you break, you have to scan on to the next instance of alignment. Computationally it's not tractable to align any random sequence because $\pi$ could match up to the $2^{2^{2^2}}+1$-th place. Heck even if the sequence did align with $\pi$ somewhere, doesn't mean it's not random. For instance, I could choose 3 at random, doesn't mean it's the first digit of $\pi$.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Exactly what "number theory result" are you referring to? AFAIK, nobody even knows whether $\pi$ is a [normal number.](https://en.wikipedia.org/wiki/Normal_number) – whuber Sep 04 '19 at 13:04
  • @whuber what I mean is that whether $\pi$ actually contains every possible subsequence of numbers is not known (correct me if I'm wrong) and that proof/finding has nothing to do with randomness/probability – AdamO Sep 04 '19 at 15:32
  • 3
    I don't really follow this answer. Yes, pi is fixed, but the series of digits can still behave like a series of random numbers. I don't see how 0.1212... represents randomness by any definition. And as you point out in your comment, whether or not pi contains some arbitrary sequence of digits has little bearing on the random nature of its digits. So why focus on that? – Nuclear Hoagie Sep 04 '19 at 15:45
  • @NuclearWang Just because the order of a sequence of digits is incomprehensible to our naive minds doesn't mean it's "as good as random". Here's an example of a non-repeating number that meets perhaps some randomness requirements but not others: 0.12112211122211112222... Nonetheless, I can grab a subset of the prior number history and predict the entire future. The same can be said of $\pi$, it just requires that I know *all* the time-series history. – AdamO Sep 04 '19 at 18:12
  • 2
    @AdamO You can only make that prediction if you know beforehand that the number you're describing is pi, which seems like cheating. The digits in 3.141592 give no indication that the next digit is 6; the only way you know that is because we're specifically describing pi. Unless you've already calculated pi to N digits, there isn't any reason to expect digit N to be any particular number. You seem to imply that there's no such thing as a random sequence of numbers, because once you write it down, it's fixed. – Nuclear Hoagie Sep 04 '19 at 18:58
  • @NuclearWang No that is not cheating. An *actually* random process does *not* have that quality. If I knew all the weather of all the history of the known universe, I *still* couldn't perfectly predict tomorrow's weather because of its chaotic (infinite dimensional properties). Interestingly enough, weather is a deterministic but non-identifiable process and hence meets the key criteria of randomness. – AdamO Jun 12 '20 at 15:58
  • @NuclearWang consider additionally that we aren't discussing the decimal expansion of 1 as if it were somehow a viable random number generator. It's not "cheating" that I know it's 1, it's just not "interesting". But the same argument holds, "Just 'cuz you know the expansion is 1.0000000000000... doesn't meant the 14-th digit is known to be 0!" – AdamO Jun 12 '20 at 16:00
  • You're not "predicting" anything if you already know the number is pi, you're just doing a lookup for the Nth digit. There's no property *intrinsic* to the sequence of digits that informs you about the next digit, it's only the *extrinsic* knowledge that the number is pi that gives you that information. Similarly, I could collect weather data from 2010-2019, look at the first 9 years, and perfectly "predict" 2019's weather. There's not much in the first 9 years of data that lets me do that, it's the fact that I know this is the sequence of weather in the 2010s that makes it possible. – Nuclear Hoagie Jun 12 '20 at 16:19
  • @NuclearWang it's for the same semantic reason that we refer to "random number generators" from the computer as *pseudo* random. It's not random, if you knew the key and the seed, it's entirely reproducible and any sequence is predictable henceforth. It's also the same reason that all cryptos are eventually hackable. You can't ask if something is "random" with a total lack of attention or care. – AdamO Jun 12 '20 at 16:21
  • But this brings us back to the question in the title. Even though they are perfectly predictable, the sequence of numbers given by a pseudorandom number generator are, in fact, *statistically* random (to a point), even though they are not *actually* random. Similarly, the digits of pi are fixed and known, but they have statistical properties that are indistinguishable from a true random number sequence. – Nuclear Hoagie Jun 12 '20 at 18:57