When using random variables in most programming languages the usual process is based on instatiating a RandomGenerator which will output an stream of pseudo-random numbers and with this stream the rest of distributions can then be calculated.
My questions are:
- Why sampling random variables sequencially alters the original sequence?
To illustrate the point, this behaviour can be reproduced with the following code in Python:
import numpy as np
sample_size = 5
np.random.seed(seed)
a = []
for _ in range(sample_size ):
a.append(np.random.rand())
np.random.seed(seed)
b = []
for _ in range(sample_size ):
b.append(np.random.rand())
np.random.normal()
print(a)
print(b)
print(np.isin(b, a).mean())
As one can see in the code, drawing normally distributed samples altered the distribution of the uniform distributed samples. Moreover, the proportion of common elements between b and a tends to be 0.44 as the sample size increases for some reason.
This leads to a second question:
- Where this 0.44 comes from? Why is it different depending on the distribution used as auxiliary? (0.5 for exponential, 0.20 for beta, etc.)
EDIT: The question was too general at the beginning and thus I decided to split the question into two in order to select a proper answer. The follow up question is available here.