8

Suppose I have $n$ independent normal random variables

$$X_1 \sim \mathrm{N}(\mu_1, \sigma_1^2)\\X_2 \sim \mathrm{N}(\mu_2, \sigma_2^2)\\\vdots\\X_n \sim \mathrm{N}(\mu_n, \sigma_n^2)$$

and $Y=X_1+X_2+\dotsm+X_n$. How would I characterize the density of $Y$ if the distribution of each $X_i$ is each truncated to within $(\mu_i - 2\sigma_i, \mu_i + 2\sigma_i)$? In other words, I'm sampling from $n$ independent normal distributions, discarding samples not within $2\sigma_i$ of each mean, and summing them.

Right now, I'm doing this with the R code below:

x_mu <- c(12, 18, 7)
x_sd <- c(1.5, 2, 0.8)
a <- x_mu - 2 * x_sd
b <- x_mu + 2 * x_sd

samples <- sapply(1:3, function(i) {
  return(rtruncnorm(100000, a[i], b[i], x_mu[i], x_sd[i]))
})

y <- rowSums(samples)

Is there any method for generating the density of $Y$ directly?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Devin
  • 191
  • 3
  • 2
    Your question implies you *know* all the $\sigma_i$. Is that really the case or are you *estimating* them? There's a huge difference! Out of curiosity, why are you throwing away such data? Depending on your objectives, I suspect there exist (much) better procedures. – whuber Jul 03 '14 at 21:14
  • I do know all of the means and SDs for my data, yes. – Devin Jul 03 '14 at 21:41
  • 7
    I believe that you could characterize it as "a mess". This paper, http://www.jstor.org/stable/2236545 , examines the matter, with more scientific rigor. – Alecos Papadopoulos Jul 04 '14 at 03:27
  • 2
    Outside approximation via CLT, this is relatively tricky. I guess if $n$ is small enough you could try numerical convolution. – Glen_b Jul 05 '14 at 07:04
  • Even with CLT it's tricky for the SDs are different of the terms in the sum. – Aksakal Jan 21 '15 at 13:48
  • @Glen_b This isn't something I've had to deal with before, but doesn't the [convolution theorem](http://en.wikipedia.org/wiki/Convolution_theorem) allow relatively fast numerical convolution? What kind of magnitude would $n$ have to have before it became problematic? – Silverfish Jan 21 '15 at 15:15
  • 2
    @Silverfish Depending on implementation, platform and how fine a grid is tolerable, hundreds should be fine (perhaps more); besides speed, though, with enough terms you have to be much more careful about details of implementation or a number of numerical issues can start to crop up. – Glen_b Jan 21 '15 at 15:29
  • You say "truncated" which seems to imply that obs say below the lower limit are replaced by that limit, but then you say "discarding samples" which is different? So what do you really mean? – kjetil b halvorsen Sep 14 '15 at 14:57

2 Answers2

2

You could use approximation by saddlepoint methods, for the sum of truncated normals. I will not give the details now, you can look at my answer to General sum of Gamma distributions for hints. What we need is to find the moment-generating function for a truncated normal, which is easy. I will do it here for a standard normal truncated at $\pm 2$, which has density $$ f(x) =\begin{cases} \frac1{C} \phi(x), & |x| \le 2 \\ 0, & |x| > 2 \end{cases} $$ where $C=\Phi(2) - \Phi(-2)$ here $\phi(x), \Phi(x)$ are density and cdf for a standard normal, respectively.

The moment generating function can be calculated as $$ \DeclareMathOperator{\E}{\mathbb{E}} M(t) = \E e^{tX}=\frac1{C}\int_{-2}^2 e^{tx} \phi(x)\; dx=\frac1{C}e^{\frac12 t^2} [\Phi(2-t)-\Phi(-2-t) ] $$ and then we can use saddlepoint approximations.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
-3

I'm curious why, but yes, there is a simple way to generate the pdf of this sum of distributions:

## install.packages("truncnorm")
## install.packages("caTools")
library(truncnorm)

x.mu <- c(12, 18, 7)
x.sd <- c(1.5, 2, 0.8)
x.a <- x.mu - 2*x.sd
x.b <- x.mu + 2*x.sd

dmulti <- function(x, a, b, mu, sd)
  rowSums(
    sapply(1:length(mu),
           function(idx)
             dtruncnorm(x, a=a[idx], b=b[idx], mean=mu[idx], sd=sd[idx])))/length(mu)
pmulti <- function(q, a, b, mu, sd)
  rowSums(
    sapply(1:length(mu),
           function(idx)
             ptruncnorm(q, a=a[idx], b=b[idx], mean=mu[idx], sd=sd[idx])))/length(mu)

pointrange <- range(c(x.a, x.b))
pointseq <- seq(pointrange[1], pointrange[2], length.out=100)
## Plot the probability density function
plot(pointseq, dmulti(pointseq, x.a, x.b, x.mu, x.sd),
     type="l")

## Plot the cumulative distribution function
plot(pointseq, pmulti(pointseq, x.a, x.b, x.mu, x.sd),
     type="l")
Bill Denney
  • 506
  • 4
  • 10
  • If I read this code correctly, you seem to be implementing something like a *mixture* rather than a summation. The plot this code produces is woefully incorrect. It's not even a valid probability density function! – whuber Nov 03 '14 at 18:57
  • @whuber, thanks for the catch. I normalized the pdf and added the cdf. – Bill Denney Nov 08 '14 at 12:38
  • 3
    Thank you. However, the basic error persists: you are computing a *mixture* distribution rather than the sum. – whuber Nov 08 '14 at 13:17