Sum of normal truncated random variables

Question

Suppose I have $n$ independent normal random variables

$$X_1 \sim \mathrm{N}(\mu_1, \sigma_1^2)\\X_2 \sim \mathrm{N}(\mu_2, \sigma_2^2)\\\vdots\\X_n \sim \mathrm{N}(\mu_n, \sigma_n^2)$$

and $Y=X_1+X_2+\dotsm+X_n$. How would I characterize the density of $Y$ if the distribution of each $X_i$ is each truncated to within $(\mu_i - 2\sigma_i, \mu_i + 2\sigma_i)$? In other words, I'm sampling from $n$ independent normal distributions, discarding samples not within $2\sigma_i$ of each mean, and summing them.

Right now, I'm doing this with the R code below:

x_mu <- c(12, 18, 7)
x_sd <- c(1.5, 2, 0.8)
a <- x_mu - 2 * x_sd
b <- x_mu + 2 * x_sd

samples <- sapply(1:3, function(i) {
  return(rtruncnorm(100000, a[i], b[i], x_mu[i], x_sd[i]))
})

y <- rowSums(samples)

Is there any method for generating the density of $Y$ directly?

Your question implies you *know* all the $\sigma_i$. Is that really the case or are you *estimating* them? There's a huge difference! Out of curiosity, why are you throwing away such data? Depending on your objectives, I suspect there exist (much) better procedures. — whuber, Jul 03 '14 at 21:14
I believe that you could characterize it as "a mess". This paper, http://www.jstor.org/stable/2236545 , examines the matter, with more scientific rigor. — Alecos Papadopoulos, Jul 04 '14 at 03:27
Outside approximation via CLT, this is relatively tricky. I guess if $n$ is small enough you could try numerical convolution. — Glen_b, Jul 05 '14 at 07:04
Even with CLT it's tricky for the SDs are different of the terms in the sum. — Aksakal, Jan 21 '15 at 13:48
@Glen_b This isn't something I've had to deal with before, but doesn't the [convolution theorem](http://en.wikipedia.org/wiki/Convolution_theorem) allow relatively fast numerical convolution? What kind of magnitude would $n$ have to have before it became problematic? — Silverfish, Jan 21 '15 at 15:15
@Silverfish Depending on implementation, platform and how fine a grid is tolerable, hundreds should be fine (perhaps more); besides speed, though, with enough terms you have to be much more careful about details of implementation or a number of numerical issues can start to crop up. — Glen_b, Jan 21 '15 at 15:29
You say "truncated" which seems to imply that obs say below the lower limit are replaced by that limit, but then you say "discarding samples" which is different? So what do you really mean? — kjetil b halvorsen, Sep 14 '15 at 14:57

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

You could use approximation by saddlepoint methods, for the sum of truncated normals. I will not give the details now, you can look at my answer to General sum of Gamma distributions for hints. What we need is to find the moment-generating function for a truncated normal, which is easy. I will do it here for a standard normal truncated at $\pm 2$, which has density $$ f(x) =\begin{cases} \frac1{C} \phi(x), & |x| \le 2 \\ 0, & |x| > 2 \end{cases} $$ where $C=\Phi(2) - \Phi(-2)$ here $\phi(x), \Phi(x)$ are density and cdf for a standard normal, respectively.

The moment generating function can be calculated as $$ \DeclareMathOperator{\E}{\mathbb{E}} M(t) = \E e^{tX}=\frac1{C}\int_{-2}^2 e^{tx} \phi(x)\; dx=\frac1{C}e^{\frac12 t^2} [\Phi(2-t)-\Phi(-2-t) ] $$ and then we can use saddlepoint approximations.

Bill Denney · Answer 2 · 2014-11-08T12:37:39.523

-3

I'm curious why, but yes, there is a simple way to generate the pdf of this sum of distributions:

## install.packages("truncnorm")
## install.packages("caTools")
library(truncnorm)

x.mu <- c(12, 18, 7)
x.sd <- c(1.5, 2, 0.8)
x.a <- x.mu - 2*x.sd
x.b <- x.mu + 2*x.sd

dmulti <- function(x, a, b, mu, sd)
  rowSums(
    sapply(1:length(mu),
           function(idx)
             dtruncnorm(x, a=a[idx], b=b[idx], mean=mu[idx], sd=sd[idx])))/length(mu)
pmulti <- function(q, a, b, mu, sd)
  rowSums(
    sapply(1:length(mu),
           function(idx)
             ptruncnorm(q, a=a[idx], b=b[idx], mean=mu[idx], sd=sd[idx])))/length(mu)

pointrange <- range(c(x.a, x.b))
pointseq <- seq(pointrange[1], pointrange[2], length.out=100)
## Plot the probability density function
plot(pointseq, dmulti(pointseq, x.a, x.b, x.mu, x.sd),
     type="l")

## Plot the cumulative distribution function
plot(pointseq, pmulti(pointseq, x.a, x.b, x.mu, x.sd),
     type="l")

edited Nov 08 '14 at 12:37

answered Jul 31 '14 at 00:56

Bill Denney

506
4
10

If I read this code correctly, you seem to be implementing something like a *mixture* rather than a summation. The plot this code produces is woefully incorrect. It's not even a valid probability density function! – whuber Nov 03 '14 at 18:57
@whuber, thanks for the catch. I normalized the pdf and added the cdf. – Bill Denney Nov 08 '14 at 12:38
3

Thank you. However, the basic error persists: you are computing a *mixture* distribution rather than the sum. – whuber Nov 08 '14 at 13:17

Sum of normal truncated random variables

2 Answers2