1

I need to draw random integers from a truncated discrete Gaussian distribution. That is, the probability of integer $i$ is:

$$p_i \propto \exp\left( -\frac{(i-\mu)^2}{2\sigma^2} \right)$$

for $a \le i \le b$, and $p_i = 0$ otherwise. The missing proportionality constant is obtained from the normalization condition, $\sum_i p_i = 1$. Here $\mu$ and $\sigma$ are real parameters.

Is there a way (package, algorithm, trick, anything) to sample this distribution?

becko
  • 3,298
  • 1
  • 19
  • 36
  • 2
    This is clearly on topic under the [help/on-topic]. I have re-opened. – Glen_b Oct 25 '17 at 00:23
  • If the region to sample is not low probability tail region, the simplest approach is to sample from an untruncated normal and throw away the samples that are out of bounds. When the valid region is a high probability region this should work fine, otherwise you will be throwing away a lot of samples and algorithmic time performance may be bad. https://stats.stackexchange.com/questions/238778/how-do-you-sample-mathcal-n-mu-sigma2-from-a-range – Cagdas Ozgenc Oct 25 '17 at 06:23
  • 1
    For the untruncated case you could use accept/reject ("rejection method"); and it turns out to be surprisingly straightforward - if I have this right you can do it by simply rounding a normal with the same mean and variance as the proposal. You need to scale the pmf of that rounded normal so that it's a proper envelope, but I believe the smallest ratio always occurs at round$(\mu)$, so that's straightforward. It would be possible to reduce the rejection rate further by manipulating the proposal slightly but it's unlikely to be worth it. – Glen_b Oct 25 '17 at 08:06
  • Since generating a truncated Gaussian is quite simple (generate a uniform $U$ on $(F(a), F(b))$ and take $F^{-1}(U)$), that should combine with a slight modification of the untruncated scheme above (you need adapt the envelope-scaling-factor $c$ -- in a simple way) to produce a highly efficient method. – Glen_b Oct 25 '17 at 08:10
  • @Xi'an Yes, a distribution on the integers. What is confusing on the question? – becko Oct 25 '17 at 08:16
  • @Glen_b: since the support is made of integers, even accept-reject sounds like overkill. Unless $b$ is much larger than $a$ _and_ they are relatively close to the mean $\mu$. – Xi'an Oct 25 '17 at 09:09
  • @Xi'an yes, indeed for most such cases you can just compute the probabilities and call `sample` -- if numerical issues (like underflow say) are not going to be at issue, you don't actually even need to normalize them (`sample(a:b,n,p=dnorm(a:b,mu,sigma),replace=TRUE)`). – Glen_b Oct 25 '17 at 09:16
  • @CagdasOzgenc: the "truncated Normal distribution" is a misnomer in this question, as this is a discrete distribution with probabilities looking like normal pdfs _at integers_. Hence truncated Normal algorithms do not apply. – Xi'an Oct 25 '17 at 15:54

1 Answers1

2

I have trouble with seeing where the difficulty stands in simulating from this integer supported distribution. An R code like

probz=dnorm(a:b,mu,sigma)/sum(dnorm(a:b,mu,sigma))

would work for central values of (a,b), while extreme values would require a bit more caution, e.g.

refz=max(dnorm(a:b,mu,sigma,log=TRUE))
probz=dnorm(a:b,mu,sigma)-refz
probz=exp(probz)/sum(exp(probz))

Then generating by

sample(a:b,n,prob=probz,rep=TRUE)

delivers n iid replicas from this Gaussian-like distribution. Note that if a>mu and b>a are large [in orders of sigma] compared with the mean mu, this distribution will be undistinguishable from a point mass at a.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Regarding your last sentence, if $\sigma$ is not much smaller than 1, then the probability of $a+1$, $a+2$, and so on for at least a few successive integer values will also be significant – becko Oct 25 '17 at 08:36
  • @becko: if $\sigma=1$, $\mu=0$, $a=100 – Xi'an Oct 25 '17 at 08:40