9

Is $\text{Unif}(- \infty, \infty ) $ a valid distribution?

I'm trying to capture the idea of a completely random number (where every number has an equal chance of being chosen), but I'm not sure if that idea can be captured using a valid statistical distribution.

I'm thinking there may be some way to do this via a mapping. For example, it looks like we can map numbers from the range $(-1, 3)$ to all of the reals via the function $f(x) = - \log (\frac{4}{x+1} - 1)$ which has a domain of $(-1, 3)$ and a range of $(-\infty, \infty)$.

So, could $\text{Unif}(-\infty, \infty)$ be defined using a mapping somehow from $\text{Unif}(-1, 3)$? If so, how would you define this mapping?

Ben
  • 91,027
  • 3
  • 150
  • 376
Pro Q
  • 513
  • 3
  • 8
  • 10
    There is [no uniform distribution over the real line.](https://math.stackexchange.com/a/14809/14893) Transforming a uniform variate over a finite interval into a real valued variate means the density of the transform is no longer constant. – Xi'an Aug 26 '21 at 06:31
  • 5
    @Pro To be uniform there must be some constant $c>0$ that the density is equal to on its support. – Glen_b Aug 26 '21 at 07:02

3 Answers3

10

Uniform distribution has a finite range $-\infty < a < b < \infty$. It's probability density function is $p(x) = \frac{1}{b - a}$ for $x \in (a, b)$. If you set the range to infinities, you'd end up with $p(x) = \frac{1}{\infty } \to 0$. It doesn't integrate to unity since it would need to be $p(x) = c$ for some $c > 0$. For a finite support, as the support grows, $c \to 0$. If you think of it differently, for every $c > 0$ there would be an uniform distribution with a finite support with probability density function equal to $\tfrac{1}{b - a} = c$, so there cannot be such distribution with an infinite support. It's not a proper distribution.

As for your idea with mapping, notice that using such transformation would not lead to having uniformly distributed values. Try the following numerical experiment.

x <- runif(1e6, -1, 3)
y <- -log(4/(x+1) - 1)
hist(y, breaks=100)

As you can see, the result looks nothing like a uniform distribution.

Histogram showing a distribution with a peak at zero and exponentially decaying tails on both sides.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Does this mean that the common statement "maximum likelihood estimation is equivalent to maximum a posteriori estimation with a uniform prior" is false for continuous distributions? – mhdadk Sep 18 '21 at 00:24
  • 2
    @mhdadk it’s rather irrelevant to the question, but answering your comment: they are equivalent if you use the flat prior, an improper distribution with infinite support. It’s not a proper distribution, but can technically be used for optimization. – Tim Sep 18 '21 at 06:35
6

The main answer by Tim tells you about the uniform distribution as conceived in the traditional framework of probability theory, and it explains this by appeal to the asymptotics of density functions. In this answer I will give a more traditional explanation that goes back to the underlying axioms of probability within the standard framework, and I will also explain how one might go about obtaining the distribution of interest within alternative probability frameworks. My own view is that there are generalised frameworks for probability that are reasonable extensions that can be used in this case, and so I would go so far as to say that the uniform distribution on the reals is a valid distribution.


Why can't we have a uniform distribution on the reals (within the standard framework)?

Firstly, let us note the mathematical rules of the standard framework of probability theory (i.e., representing probability as a probability measure satisfying the Kolmogorov axioms). In this framework, probabilities of events are represented by real numbers, and the probability measure must obey three axioms: (1) non-negativity; (2) norming (i.e., unit probability on the sample space); and (3) countable additivity.

Within this framework, it is not possible to obtain a uniform random variable on the real numbers. Within this set of axioms, the problem comes from the fact that we can partition the set of real numbers into a countably infinite set of bounded parts that have equal width. For example, we can write the real numbers as the following union of disjoint sets:

$$\mathbb{R} = \bigcup_{a \in \mathbb{Z}} [a, a+1).$$

Consequently, if the norming axiom and the countable additivity axiom both hold, then we must have:

$$\begin{align} 1 &= \mathbb{P}(X \in \mathbb{R}) \\[6pt] &= \mathbb{P} \Bigg( X \in \bigcup_{a \in \mathbb{Z}} [a, a+1) \Bigg) \\[6pt] &= \sum_{a \in \mathbb{Z}} \mathbb{P} ( a \leqslant X < a+1). \\[6pt] \end{align}$$

Now, under uniformity, we would want the probability $p \equiv \mathbb{P} ( a \leqslant X < a+1)$ to be a fixed value that does not depend on $a$. This means that we must have $\sum_{a \in \mathbb{Z}} p = 1$, and there is no real number $p \in \mathbb{R}$ that satisfies this equation. Another way to look at this is, if we set $p=0$ and apply countable additivity then we get $\mathbb{P}(X \in \mathbb{R}) = 0$ and if we set $p>0$ and apply countable additivity then we get $\mathbb{P}(X \in \mathbb{R}) = \infty$. Either way, we break the norming axiom.


Operational difficulties with the uniform distribution over the reals

Before examining alternative probability frameworks, it is also worth noting some operational difficulties that would apply to the uniform distribution over the reals even if we can define it validly. One of the requirements of the distribution is that:

$$\mathbb{P}(X \in \mathcal{A} | X \in \mathcal{B}) = \frac{|\mathcal{A} \cap \mathcal{B}|}{|\mathcal{B}|}.$$

(We use $| \ \cdot \ |$ to denote the Lebesgue measure of a set.) Consequently, for any $0<a<b$ we have:

$$\mathbb{P}(|X| \leqslant a) \leqslant \mathbb{P}(|X| \leqslant a | |X| \leqslant b) = \frac{a}{b}.$$

We can use this inequality and make $b = a/\epsilon$ arbitrarily large, so we have:

$$\mathbb{P}(|X| \leqslant a) \leqslant \epsilon \quad \quad \quad \text{for all } \epsilon>0.$$

If the probability is a real value then this implies that $\mathbb{P}(|X| \leqslant a)=0$ for all $a>0$, but even if we use an alternative framework allowing infinitesmals (see below), we can still say that this probability is smaller than any positive real number. Essentially this means that under the uniform distribution over the reals, for any specified real number, we will "almost surely" get a value that is higher than this. Intuitively, this means that the uniform distribution over the reals will always give values that are "infinitely large" in a certain sense.

This requirement of the distribution means that there are constructive problems when dealing with this distribution. Even if we work within a probability framework where this distribution is valid, it will be "non-constructive" in the sense that we will be unable to create a computational facility that can generate numbers from the distribution.


What alternative probability frameworks can we use to get around this?

In order to allow a uniform distribution on the real numbers, we obviously need to relax one or more of the rules of the standard probability framework. There are a number of ways we could do this which would allow a uniform distribution on the reals, but they all have some other potential drawbacks. Here are some of the possibilities for how we might generalise the standard framework.

Allow infinitesimal probabilities: One possibility is to relax the requirement that a probability must be a real value, and instead extend this to allow infinitesimals. If we allow this, we then set $dp \equiv \mathbb{P} ( a \leqslant X < a+1)$ to be an infinitesimal value that satisfies the requirements $\sum_{a \in \mathbb{Z}} dp = 1$ and $dp \geqslant 0$. With this extension we can keep all three of the probability axioms in the standard theory, with the non-negativity axiom suitably extended to recognise non-negative infinitesimal numbers.

Probability frameworks allowing infinitesimal probabilities exist and have been examined in the philosophy and statistical literature (see e.g., Barrett 2010, Hofweber 2014 and Benci, Horsten and Wenmackers 2018. There are also broader non-standard frameworks of measure theory that allow infinitesimal measures, and infinitesimal probability can be regarded as a part of that broader theory.

In my view, this is quite a reasonable extension to standard probability theory, and its only real drawback is that it is more complicated, and it requires users to learn about infinitesimals. Since I regard this as a perfectly legitimate extension of probability theory, I would go so far as to say that the uniform distribution on the real numbers does exist since the user can adopt this broader framework to use that distribution.

Allow "distributions" that are actually limits of sequences/classes of distributions: Another possibility for dealing with the uniform distribution on the real numbers is to define it via a limit of a sequence/class of distributions. Traditionally, this is done by looking at a sequence of normal distributions with zero mean and increasing variance, and taking the uniform distribution to be the limit of this sequence of distributions as the variance approaches infinity (see related answer here). (There are many other ways you could define the distribution as a limit.)

This method extends our conception of what constitutes a "distribution" and a corresponding "random variable" but it can be framed in a way that is internally consistent and constitutes a valid extension to probability theory. By broadening the conception of a "random variable" and its "distribution" this also allows us to preserve all the standard axioms. In the above treatment, we would create a sequence of standard probability distributions $F_1,F_2,F_3,...$ where the values $p_n(a) = \mathbb{P} ( a \leqslant X < a+1 | X \sim F_n)$ depend on $a$, but where the ratios of these terms all converge to unity in the limit $n \rightarrow \infty$.

The advantage of this approach is that it allows us to preserve the standard probability axioms (just like in infinitesimal probability frameworks). The main disadvantage is that it leads to some tricky issues involving limits, particularly with regard to the interchange of limits in various equations involving non-standard distributions.

Replace the countable additivity with finite additivity: Another possibility is that we could scrap the countable additivity axiom and use the (weaker) finite additivity axiom instead. In this case we can set $p = \mathbb{P} ( a \leqslant X < a+1) = 0$ for all $a \in \mathbb{Z}$ and still have $\mathbb{P}(X \in \mathbb{R}) = 1$ to satisfy the norming axiom. (In this framework, the mathematical equations in the problem above do not apply since countable additivity no longer holds.)

Various probability theorists (notably Bruno de Finetti) have worked within the framework of this broader set of probability axioms and some still argue that it is superior to the standard framework. The main disadvantage of this broader framework of probability is that a lot of limiting results in probability are no longer valid, which wipes away a lot of useful asymptotic theory that is available in the standard framework.

Ben
  • 91,027
  • 3
  • 150
  • 376
2

Tim's answer is a fantastic answer, and I wanted to go more into depth about the results of their answer in a way that won't fit into a comment.

Let's walk through the mapping described in the question. We define the following process. Pick any random number between -1 and 3. (Sample a number from $\text{Unif}(-1, 3)$). Then, use the function $f(x) = - \log (\frac{4}{x+1} - 1)$ to convert your number between -1 and 3 to number between $-\infty$ and $\infty$ (which is the range of $f\,$).

Note that $f$ is a one-to-one function (since it passes the horizontal line test), which means that after the conversion, every single number has an equal chance of being chosen.

So then why isn't this a uniform distribution over all of the real numbers? If every number has an equal chance of being chosen, doesn't that make it a uniform distribution? No!

The reason is that distributions don't indicate the probability of a particular value being chosen; they indicate the probability that a value within a particular range will be chosen.

In our case, it is true that every number has an equal chance of being chosen. However, there are some numbers where there's a higher chance of choosing a value that's close to that number. And that is what the distribution reflects. So the distribution is not uniform.

Overall, a uniform distribution does not mean that every number has an equal chance of being chosen. Instead, it means that no matter what range you pick, the probability of a sampled value falling within that range only depends on the size of the range. (Not the specific endpoints of the range.)

EDIT: After writing this answer, I came across a video recently published on YouTube (after this answer was written) by Numberphile (technically, Numberphile2) titled "More on Bertrand's Paradox (with 3blue1brown) - Numberphile" which gives a better intuition about this exact question. In the video, the idea that "the probability of a sampled value falling within that range only depends on the size of the range" is what is described as "translational symmetry."

Based on the video, it is likely not correct to say that a "uniform distribution" must have translational symmetry. However, it does seem to be the case that the most common way of clearly defining an unclear uniform distribution is to restrict the distribution to have translational symmetry. (For example, one might be willing to call the distribution based on $f(x) = -\log(\frac{4}{x+1} - 1)$ a "uniform distribution." However, as clearly shown by the graph in Tim's answer, that distribution is not translationally symmetric. That is, if we look through a window at the graph, and we move that window from side to side, the graph looks different based on where we have the window.)

Pro Q
  • 513
  • 3
  • 8