7

How do I generate $n$ identically distributed but not independent normal random numbers such that their sum falls within a prespecified interval $[a,b]$ with probability $p$?

(This question is motivated by generating a random walk that ends up at a prespecified point: Random process not so random after all (deterministic). Since a continuous random variable has zero probability of reaching an exact number, we do the second best thing and ask for an entire interval to end up in.)


EDIT: Generating samples from singular Gaussian distribution has been proposed as a duplicate, which in turn is closed as a duplicate of Generate normally distributed random numbers with non positive-definite covariance matrix. I agree that both of these are helpful. However, the point of the current question (more specifically, of the answer) is to first figure out that we can use a multivariate normal distribution to address the question, and second, what kind of covariance matrix works. How to sample from a distribution with that covariance is a third step, where the linked threads are helpful.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • 2
    also see the literature on [Brownian bridges](https://en.wikipedia.org/wiki/Brownian_bridge) ? – Ben Bolker Jul 25 '18 at 14:45
  • @BenBolker: that sounds like a much better idea than mine. Would you be interested in writing up an answer? – Stephan Kolassa Jul 25 '18 at 14:56
  • maybe I'll get to it, but anyone else who is reading this should feel free to jump in and write up an answer. I don't mind. – Ben Bolker Jul 25 '18 at 16:36
  • FWIW Brownian bridges are a more specific case - the sum is deterministically, exactly equal to a specified value $T_1$, not "within a given range with a given probability" ... arguably a better answer to the original question, but not to your generalization ... – Ben Bolker Jul 25 '18 at 16:38
  • @BenBolker: would it make sense to nominate the original question for reopening so you can post your BB there? – Stephan Kolassa Jul 25 '18 at 16:39
  • Sure. Again, I may or may not be able to get to answering quickly ... – Ben Bolker Jul 25 '18 at 16:40
  • 1
    (Sorry, of course I mean "BB-BB".) – Stephan Kolassa Jul 25 '18 at 16:40
  • @BenBolker: I have nominated the original question for reopening, and also flagged it for the mods, in case my explanatory comment disappears in the sea of comments. Looking forward to your answer! – Stephan Kolassa Jul 25 '18 at 16:43
  • @BenBolker, I've reopened the thread. Please provide your answer at your convenience. – gung - Reinstate Monica Jul 25 '18 at 16:57
  • done. ......... – Ben Bolker Jul 25 '18 at 17:58
  • 1
    Possible duplicate of [Generating samples from singular Gaussian distribution](https://stats.stackexchange.com/questions/159313/generating-samples-from-singular-gaussian-distribution) – Xi'an Jul 26 '18 at 14:44
  • Change the basis to one that includes the sum. The resulting distribution is still multivariate Normal. [Sample the sum from the truncated (univariate) Normal distribution.](https://stats.stackexchange.com/questions/238602/sampling-from-truncated-normal) Invert the change of basis. – whuber Jul 30 '18 at 15:15

1 Answers1

7

We will generate multivariate normals $X\sim MN(\mu, \Sigma)$ with $\mu\in\mathbb{R}^n$ and $\Sigma\in\mathbb{R}^{n\times n}$ such that their sum satisfies our condition. Let $Z=X_1 + \dots + X_n$.

As a common mean, we choose

$$ \mu_1 = \dots = \mu_n = \frac{a+b}{2n}. $$

In order that $Z\in[a,b]$ with probability $p$, its standard deviation should fulfill

$$ \sigma_Z = \frac{b-a}{q_\alpha}, $$

where $q_\alpha$ is the standard normal quantile to the level $\alpha$, here $\alpha=1-\frac{1-p}{2}$.

We now need to specify $\Sigma$. We have a lot of leeway here. Let us assume that we want each $X_i$'s variance to be $\sigma^2$ and the covariance be $\text{cov}(X_i,X_j)=\tau$ for $i\neq j$. The key to creating a "good" $\Sigma$ is this previous answer by probabilityislogic. It yields that the sum of our $X_i$s has variance

$$ n\sigma^2 + n(n-1)\tau $$

so we need that

$$ n\sigma^2 + n(n-1)\tau = \frac{b-a}{q_\alpha}.$$

We also need to ensure that $\Sigma$ is positive definite, but this is not overly hard. The easiest way to do this is to ensure that all entries in $\Sigma$ are positive, e.g., by setting

$$ \sigma^2 := \frac{\sigma_Z^2}{2n}, \quad \tau := \frac{\sigma_Z^2}{2n(n-1)}, $$

but this gives very small values and very boring cumulative sums and trajectories:

boring

Less boring is to set

$$ \sigma^2 := 1, \quad \tau := \frac{1}{n-1}\big(\frac{\sigma_Z^2}{n}-\sigma^2\big), $$

which yields much more interesting trajectories:

interesting

Note that setting this does indeed yield a valid covariance matrix, because $\Sigma$ is then of the form $\Sigma_{ij} = m(i-j)$, namely

$$ m(0) = \sigma^2, \quad m(j) = \tau\text{ for }j>0, $$

and we have that

$$ \sum_{j>0} |m(j)| = (n-1)|\tau| = \big|\frac{\sigma_Z^2}{n}-\sigma^2\big| = \big|\frac{\sigma_Z^2}{n}-1\big| < 1 = \sigma^2 = m(0), $$

which is a sufficient condition for $\Sigma$ to be strictly positive definite by Wikipedia (Point 7 under "Further Properties").

R code below, but first, please go and upvote probabilityislogic's answer.

n_steps <- 1000
target_min <- 1.99
target_max <- 2.01
target_prob <- 0.99

target_mean <- mean(c(target_min,target_max))
target_sd <- (target_max-target_mean)/qnorm(p=1-(1-target_prob)/2)

mm <- rep(target_mean/n_steps,n_steps)

# boring setting:
# sigma_sq <- target_sd^2/(2*n_steps)
# tau <- target_sd^2/(2*n_steps*(n_steps-1))

sigma_sq <- 1
tau <- (target_sd^2/n_steps-sigma_sq)/(n_steps-1)

CC <- matrix(tau,nrow=n_steps,ncol=n_steps)
diag(CC) <- sigma_sq

library(MASS)
foo <- mvrnorm(1,mu=mm,Sigma=CC)
sum(foo)

plot(cumsum(foo),type="l",xlab="",ylab="")
abline(h=target_mean,lty=2)
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357