1

Suppose I have a geometric distribution given which looks like this

enter image description here

Here p = 0.4 and x is from 0 to inf. Now what does it mean when one says generate a random number/variable using this distribution?

mdewey
  • 16,541
  • 22
  • 30
  • 57
afsara_ben
  • 113
  • 3
  • What programming language are you using? Do you know how to generate a number from the distribution you give in your example? –  Sep 17 '20 at 10:27
  • i am using python and yes i can call the `scipy.stats.geom.rvs` function to generate a random number but i wanted to understand what it meant conceptually – afsara_ben Sep 17 '20 at 10:34
  • a random number/variable is picked from within a range say 1 to 1000, but what does it mean to pick a number from a given distribution – afsara_ben Sep 17 '20 at 10:35
  • 1
    In the terminology at https://stats.stackexchange.com/a/54894/919, generating a random number amounts to drawing one ticket from a box. – whuber Sep 17 '20 at 14:31

2 Answers2

7

In general, generating a random number from a probability distribution means transforming random numbers so that the numbers fit the distribution.

Perhaps the most generic way to do so is called inverse transform sampling:

  1. Generate a uniform random number in [0, 1].
  2. Run the quantile function (also known as the inverse CDF or the PPF) on the uniform random number.
  3. The result is a random number that fits the distribution.

However, this technique can't be used in practice for all distributions. The main reason is that the quantile function is either unavailable or hard to calculate. Thus, for many distributions, other techniques are used. They include rejection sampling, direct transformations, etc.

In the case of the geometric distribution, there are at least two ways to generate numbers that follow it. One way is a direct transformation:

  1. Set x to 0.
  2. With probability p, return x.
  3. Add 1 to x and go to step 2.

A geometric random number can also be found by inverse transform sampling, described below.

  1. Generate a uniform random number in [0, 1], call it u.
  2. Run the quantile function, which is floor(log((u - 1)/(p-1))/log(1-p)).
  3. The result is a geometric random number.

Other ways to generate geometric random numbers are available. The choice of algorithm depends on many things, including efficiency, simplicity, and accuracy. (Note that the geometric distribution is defined differently in different works.) The same applies to other probability distributions.

The 1986 book Non-Uniform Random Variate Generation by Luc Devroye goes into random generation from various distributions in detail. See also my article on randomization and sampling methods.

Peter O.
  • 863
  • 1
  • 4
  • 18
  • 2
    Although this answer describes various *procedures* to generate random numbers, it does not answer the question concerning what this process really *means.* – whuber Sep 17 '20 at 14:30
  • 4
    @whuber: I answered this question while it was on _Stack Overflow_, a programming-oriented site; thus, it reflected what I believed the asker had in mind at the time I wrote it: what it means to code the random generation of the geometric and other distributions. – Peter O. Sep 17 '20 at 15:19
  • 1
    Thank you for that clarification, @Peter. In light of that context, here's my +1 to you. – whuber Sep 17 '20 at 17:08
-1

This is a deep question, covered quite nicely in Sivia & Skilling’s textbook.

Briefly, a draw from a distribution $p(x)$ means that a procedure returns a number for which $p(x)$ describes to the best of our knowledge what that number might be. In other words, we cannot do better than using $p(x)$ for predicting $x$ with the knowledge that we have.

Knowledge here is crucial. If you know e.g., the seed and algorithm that a random number generator uses, the draws are completely predictable and you can do better than $p(x)$. Similarly, if you’ve tested the random number generator extensively, you may know better what it might return.

innisfree
  • 1,124
  • 6
  • 23
  • The vagueness of this answer is potentially confusing: what exactly does $p(x)$ refer to and how would it be employed for predicting random values? How do you address the apparent contradiction that the *expectation* of a random variable is its best linear predictor? – whuber Mar 02 '22 at 17:00
  • You can fill in the gaps – innisfree Mar 03 '22 at 12:42