16

In probability and statistics, the concept of "random" and "randomness" are frequently used. Often the concept of a random variable is used to model events that occur due to chance.

My question regards the term "random". What is random? Does randomness really exist?

I am curious what people that have a lot of experience in working with random events think and believe about randomness.

Andrew
  • 1,090
  • 10
  • 26
  • 1
    Are you seeking an authoritative answer or a collection of different opinions? Although I don't think there's any question that this subject is on topic, a question has been raised concerning whether this thread should be made CW (Community Wiki), especially because few of the existing replies appear authoritative. – whuber Apr 11 '12 at 18:05
  • 3
    Much like causality, it is what you define it to be. See a possible definition here: http://en.wikipedia.org/wiki/Algorithmically_random_sequence – JohnRos Apr 11 '12 at 18:20
  • Related: https://stats.stackexchange.com/questions/549914/are-randomness-and-probability-really-logically-dependent-notions – DifferentialPleiometry Dec 17 '21 at 21:16
  • This is what differentiates paradigms based on how "random" and "probability" are interpreted. To the frequentist, an event is considered random if it was selected from a sample space. Probability summarizes the emergent pattern of many samples. To the Bayesian, an "event" is considered random if the result is not known. I put "event" in quotation marks because Bayesians apply this to unknown fixed population quantities that the frequentist would not consider an "event." To the Bayesian, probability measures the subjective belief of the experimenter. – Geoffrey Johnson Dec 17 '21 at 21:27

6 Answers6

10

Here's a deflationary theory: Something is random when its behaviour is modeled formally using the machinery of probability theory, an axiomatized bit of pure mathematics. So in a sense the answer to the first question is rather trivial.

In approaching the rather less well-posed question 'does randomness really exist?' it's helpful to ask yourself whether vectors 'really' exist. And when you have a view about that, asking yourself a) whether it's surprising or not that polynomials are vectors, b) whether and how we could be wrong about that, and finally c) whether, e.g. forces in physics are the things that vectors 'are' in the sense of the question. Probably none of these questions will help much understanding what's going on in the forum, but they will bring out the relevant issues. You might start here and then follow up the other Stanford Encyclopaedia entries on philosophy of probability and statistics.

There is a lot of discussion there, thankfully not much found around here, about the existence and relevance of 'actual' physical randomness, usually of the quantum variety some of which is (usefully) gestured toward by @dmckee in the comments above. There's also the idea that randomness as some sort of uncertainty. Within the minimal framework of Cox it can be reasonable to think of (suitably tidied up) uncertainties as being isomorphic with probabilities, so such uncertainties are, by virtue of that connection, treatable as if they are random. Clearly the theory of repeated sampling also makes use of probability theory, by virtue of which its quantities are random. One or other of these frameworks will cover all the relevant aspects of randomness that I've ever seen in these forums.

There are legitimate disagreements about what should and should not be modeled as random, which you can find under the banners Bayesian and Frequentist, but these positions only suggest but do not full determine the meaning of the randomness involved, just the scope.

conjugateprior
  • 19,431
  • 1
  • 55
  • 83
  • 4
    +1 for introducing many thoughtful concepts into the discussion. I would like to suggest it may help to maintain a sharper distinction between randomness and uncertainty: one leads to the other, but not *vice versa*, yet many people (obviously not you!) exhibit some confusion about the difference. We know that not all uncertainty comes from randomness, nor is all that is arbitrary or variable necessarily "random" in the technical sense employed in statistical practice. – whuber Apr 11 '12 at 15:22
  • I guess you're identifying random with sampling variability, which is obviously fine. I was trying to separate three things: the probability theory, the things that vary in repeated sampling, and uncertainty about stuff. (A strong and controversial connection claimed for the connections between them that might interest you is Lewis's 'Principal Principle' from 'A Subjectivist’s Guide to Objective Chance'.) – conjugateprior Apr 11 '12 at 23:45
  • Please don't read that much into my comment: I had no intention of identifying randomness with sampling variability! I just wanted to call (positive) attention to some of the points you make. To agree or disagree with them would require a lengthy detailed analysis. (To get a sense of the kind of analysis involved, the article at http://plato.stanford.edu/entries/chance-randomness/#4 is of interest. But please don't assume that I hold with all the assertions in that article just because I am drawing attention to it!) – whuber Apr 12 '12 at 13:32
7

If we assume that we are living in a deterministic (everything that happens is predetermined and given the same exact situation, the same exact things will happen), then there is no "random" at all.

In this case, "randomness" is merely used to represent what might happen given our limited knowledge. If we had perfect knowledge of a system, nothing would be random.

Andrew
  • 1,090
  • 10
  • 26
  • 1
    "If we had perfect knowledge of a system, nothing would be random."... Very philosophical... So, the concept of randomness is only a useful approximation to the unobservable components of a system? – Macro Apr 11 '12 at 12:07
  • 4
    Quantum mechanics is very clear on this (now that repeated test of Bell's inequality have been done): the world either *really does* have randomness in it *or* is constructed in such a way that you *really* can not have sufficiently complete knowledge to predict everything going forward. – dmckee --- ex-moderator kitten Apr 11 '12 at 14:26
  • 1
    (Deterministic) Newtonian mechanics is also clear on this: random phenomena arise even in classical physical systems. Invoking determinism is interesting, and helps us understand better what ought to count as "random," but ultimately is tangential to discussions of randomness in statistical practice or theory. – whuber Apr 11 '12 at 15:15
  • Well put @dmckee. I'll point out that, while most people believe Quantum Mechanics states without doubt that the world is non-deterministic, this is not actually true - that is just one **interpretation** of quantum mechanics (which happens to be the most popular), but there are [other, deterministic interpretations out there](http://physics.stackexchange.com/questions/18586/deterministic-quantum-mechanics/18598#18598). – BlueRaja - Danny Pflughoeft Apr 11 '12 at 16:21
  • 3
    @BlueRaja-DannyPflughoeft: Pay attention to what I wrote: either there is non-determinism or there is non-local information and you can not have complete knowledge. There is no point in bringing the interpretation of quantum mechanics into the discussion because the situation is independent of which interpretation you choose. – dmckee --- ex-moderator kitten Apr 11 '12 at 16:26
  • @dmckee: ...Right, I was agreeing with you *(hence the "well put")*. I was only adding a sidenote for others, as believing that *"quantum mechanics says the world is non-deterministic"* is a common and often-stated misunderstanding. – BlueRaja - Danny Pflughoeft Apr 11 '12 at 17:05
  • @dmckee---ex-moderatorkitten See [Rethinking Superdeterminism](https://arxiv.org/abs/1912.06462) and [Superdeterminism: A Guide for the Perplexed](https://arxiv.org/abs/2010.01324) on the untested assumptions of EPR experiments and Bell's inequality or CHSH's inequality. – DifferentialPleiometry Dec 17 '21 at 21:22
3

My definition of random would be unpredictable, i.e. you can never know with 100% certainty the outcome of an event, although you might be able to put a bound of the range of possibilities. A simple example would be rolling a fair dice: you can never know exactly which number will come up with each roll, but you do know it will be one of the numbers 1 through 6.

babelproofreader
  • 4,544
  • 4
  • 22
  • 35
  • 1
    "Unpredictable" makes intuitive sense, but doesn't it need some refinement? If I am ignorant of the machinery of the heavens, then the phases of Venus will be unpredictable to me. Does that make the workings of the solar system "random"? (You could make a case either way, and in so doing, you would clarify what you really mean by "unpredictable.") – whuber Apr 11 '12 at 15:27
  • This would imply that randomness is "subjective". Since one's predictability of the future varies with knowledge and tools. This would be closer to the Bayesian view point. – Memming Apr 11 '12 at 16:14
  • If one isn't ignorant of the machinery, if in fact one has 100% knowledge of how the machinery works but this still isn't sufficient to accurately predict outcomes, then this gap or inability to forecast is unpredictability or randomness. Just as Popper said that nothing is actually true but only accepted as true until falsified, babelproofreader says randomness is true, absolute unpredictability and no model, even an 100% infallibly accurate one, is actually good enough to predict randomness. This gap between reality and perfect knowledge of the "system" behind it is randomness. – babelproofreader Apr 11 '12 at 21:48
3

You can get a really nice definition of randomness, that reflects our intuition of "unpredictability" by employing some basic concepts from information theory.

The high-level idea is to develop a concept of "compression", using some fixed "compression language". This can be accomplished nicely in terms of the Kolmogorov complexity. Basically, given a language used to describe things (such as English, French, etc), we can talk about compressing a source of bits. For example, if my bit source always dumps out 01010101... where a 0 is always followed by a 1 and vice versa, then we can "compress" an arbitrarily-sized subsequence of this string: "Write '01' 20 times" is shorter than "0101010101010101010101010101010101010101". Let's define $L$ so that $L(0101010101010101010101010101010101010101) =$ "Write '01' 20 times"

Given this notion of compression, we can define a Bernoulli Random Variable to be an infinite sequence of bits $S$ so that, given ANY finite language $L$ (i.e. $L$ has a longest word), there exists an $N$ such that for all $n > N$, the length of $L(s_1s_2s_3...s_n)$ is greater than or equal to the length of $s_1s_2s_3...s_n$ (i.e. $s_1s_2...s_n$ is uncompressible in $L$). Note that this is a bit weird, because the sequence is only Bernoulli "in the limit" in some sense - as we sample more and more bits, it becomes increasingly difficult to describe them without simply "listing them off".

Basically, this means that, given the tools we have at our disposal, we can't "predetermine" what the next bits in the sequence will be, unless our language is so large that we can simply include the entire sequence up to that point in our language. For example, my language could include a one-off definition that "burbawubba := 001010100001010100100101010001010101001". Then, I could compress "001010100001010100100101010001010101001" in my language by simply saying "burbawubba". However, if my language is finite, I eventually have to run out of one-off encodings like this.

We can define many more distributions of random variables in terms of our Bernoulli random variable.

Now, of course, whether an actual physical source of bits that satisfy this property exist in the real world is another question entirely. Still, we can achieve a sort of "essentially random" that is satisfactory in all the ways we would like by simply limiting the size of our language to some reasonable number. If we limit our language to English, then we can simply go out as far as the requisite $N$, at which point the bits in the sequence essentially have no substantial relationship to anything that you can put into words. This is usually good enough for most applications - Your sequence of coin flips can't possibly be described in terms of factors concerning the demographics or health history of that group you're sampling for medical research. So, in that sense, it's "essentially random".

Him
  • 2,027
  • 10
  • 25
1

I tend to prefer a probabilistic interpretation of randomness. An event is random if gaining any additional information does not help you predict its outcome. That is, the event is unconditionally random. Notationally:

$p(A|B) = p(A) \forall B$

To put it in concrete terms; if you believe that a die roll (A) is truely random, then knowing the exact physical state of the die as it is thrown (B) confers no additional predictive power on the outcome of the toss.

Lucas
  • 391
  • 2
  • 4
  • 1
    This is an intriguing approach, but doesn't it get things reversed? Once we are certain about an event, no additional amount of information helps us predict it any better. When an event is random--say, whether $Y\gt 0$ for a bivariate normal variable $(X,Y)$--then additional information, such as the value of $X$ in this case, usually *does* "confer additional predictive power" by allowing us to replace $\Pr(Y\gt 0)$ by $\Pr(Y\gt 0|X)$. – whuber Apr 11 '12 at 16:28
  • No, the notation is a shorthand where $p(Y)$ should be expanded as $p(Y=y)$. After the event has occurred, you know it with certainty, i.e. $p(Y|Y=y, B)$ is 1 for $Y=y$ and 0 otherwise. And, yes, knowing $B$ (or $X$) is usually predictive, but then $A$ wouldn't be truly random. – Lucas Apr 11 '12 at 16:37
  • 1
    Therefore, randomness is only in the future. Once the event has occurred, we know its value and it is no longer random... even if it were random before. – Andrew Apr 11 '12 at 17:26
  • 4
    @Andrew: This is probably pedagogical, but it's the process of generating the event that is random, not the event itself. The event is just a thing. – Lucas Apr 11 '12 at 17:38
  • 1
    A section in the [Wikipedia article on randomness](http://en.wikipedia.org/wiki/Randomness#Randomness_versus_unpredictability) might help clarify how predictability and randomness differ. – whuber Apr 11 '12 at 18:08
  • At first blush, I like this, but saying something can only be "truly random" if no existing variable can have any predictive power seems a bit extreme. Am I understanding you correctly? Imagine a textbook regression model actually existed, where $y=\beta_0+\beta_1x+\mathcal{N}(0,1)$ perfectly described the entirety of the situation, would you say that $y$ was random? I think I would. – gung - Reinstate Monica Apr 11 '12 at 22:27
  • @gung: I totally agree with your comment and am not trying to say anything too controversial (I think). If you have your model where $B = {\beta_0, \beta_1}$ and $p(y|x, B) = \beta_0 + \beta_1 x + {\cal N}(0, 1)$, then all I'm saying is that any *other* information $C$ does not change the distribution, i.e. $p(y|x,B,C) = p(y|x,B)$, so, by my definition (and yours), $y$ is truly random. – Lucas Apr 12 '12 at 15:32
  • Thanks, that helps a lot. But I wonder if we still need to add something. My ability to translate b/t an idea & its mathematical representation isn't always what I might like it to be, so maybe I'm misreading this, but what if we wrote, $p(A|B)=p(A) \forall B=p(A)=1$? To me, that appears consistent w/ the answer, but I *wouldn't* call it 'random'. Am I missing something? Would you call it random? ([funny example](http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon/424#424)) – gung - Reinstate Monica Apr 12 '12 at 16:04
  • @gung: Obligatory XKCD noted. Remember that $p(A)$ is shorthand for $p(A = a)$ where $a$ is in some domain. Assuming we're talking about *discrete* distributions, $p(A = a) = 1$ can only be true if there is exactly one possible value in the domain. Technically, this is random, but it is degenerate. – Lucas Apr 12 '12 at 17:34
  • If you get into logic, computability, and martingales, there are a lot of new issues here. For example, you may have an event that could be predicted, but that prediction is not computable, i.e. there is no computer program that could compute this prediction. For example, you may have a Martin-Lof random real number, which is a particular infinite sequence of 0s and 1s, yet no computer program can have an edge when playing against it, trying to guess its digits significantly more often than in 50% of the cases --- yet mathematically the sequence is fixed. – osa Jun 28 '15 at 04:04
  • @gung-ReinstateMonica In your regression example, I would say that $y$ is comprised of deterministic and random elements, and is thus not *purely* a random variable. This would be caveated with an assumption that this was a valid model of the data generating process for $y$. If I eliminated the "purely" portion of the term "purely random", I would be comfortable in labeling $y$ a random variable. – Alexis Dec 17 '21 at 21:16
0

One of my favorite interpretations is the sampling vs design based uncertainty described in this paper.

  • Sampling uncertainty comes from the fact that a researcher collects a survey at random from the population. Randomness of the estimators such as the means comes from the fact that the researcher could have collected many different subsets of data. "Ex-ante" it is unclear which individuals will be chosen as part of the survey.
  • Design based uncertainty arises in randomized controlled trials because the researchers randomly allocate individuals to treatment and others to the control group. For each individual there is a potential outcome if she receives the treatment and if she doesn't. The difference between both groups is random because different sets of people could have been assigned to either group.

In both cases "randomness" is used to model hypothetical scenarios where different people would be chosen to be sampled/treated, even if the underlying outcomes of the "superpopulation" are fixed. This thought process is extremely important, because analysts want to draw conclusions that are not particularly sensitive to the sample chosen.

ecnmetrician
  • 602
  • 3
  • 8
  • Could you explain how these thoughts speak to the question of what randomness *is*? The interpretations you offer don't appear to distinguish randomness from uncertainty, probability, variability, hypothetical alternatives ("could have been assigned") and many other related by different concepts. – whuber Dec 18 '21 at 15:40