6

Suppose I have some independent Poisson-distributed random variables $X_1 \ldots X_N$ with parameters $\lambda_1 \ldots \lambda_N$. These can be thought of as processes where each arrival/event generates a loss of $a_1 \ldots a_N$. What I'm ultimately looking for is an upper bound $X^*$ for the loss per unit time (hour) to some confidence level $\alpha$, i.e. an $X^*$ such that

$$P\left(\sum_{i=1}^N a_iX_i \leq X^*\right) \geq 1-\alpha$$

I am mainly interested in (efficient to calculate) approximations, since I don't expect an explicit solution to exist.
As far as I understand an explicit calculation of the probability alone would involve $N$-fold nested sums (of which one can be turned into a Gamma function). My parameters are $N \in \{5 \ldots 10\}$, $\lambda_i \in [1 , 20]$, so this generates awfully many terms.

Is there a better way than Monte-Carlo simulation?
Is approximation of $X_i$ as Gaussian distributed and the following calculations for the standard deviation of the (weighted) sum sensible? Maybe with corrections?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
B Fuchs
  • 203
  • 1
  • 4

1 Answers1

6

We can use the saddlepoint approximation. I will follow closely my answer to Generic sum of Gamma random variables . For the saddlepoint approximation I will follow Ronald W Butler: "Saddlepoint approximations with applications" (Cambridge UP). See also the post How does saddlepoint approximation work?

Let $X_1, \dots, X_n$ be independent Poisson random variables with parameters $\lambda_1, \dots, \lambda_n$. Let $a_1, \dots, a_n$ be positive real numbers. We define the random variable $X=\sum_{i=1}^n a_i X_i$ and want an approximation for the distribution of $X$. When the weights $a_i$ are integers and $n$ is not to large, we can use numerical convolution. For the general case the saddlepoint approximation gives a good approximation for the density (probability) function.

The cumulant generating function for $X_i$ is given by $K_i(s) = \lambda_i (e^s - 1)$, $s \in (-\infty, +\infty)$. The cumulant generating function of $X$ is then $$ K(s) = \sum_{i=1}^n K_i(a_i s) = \sum_{i=1}^n \lambda_i (e^{a_i s} - 1) $$ We will need the first two derivatives, given by $$ K'(s) = \sum \lambda_i a_i e^{a_i s} \\ K''(s) = \sum \lambda_i a_i^2 e^{a_i s} $$ The saddlepoint equation is given by $$ K'(\hat{s})=x $$ which defines $\hat{s}=\hat{s}_x$ implicitly as a function of $x$.

The saddlepoint density function (for $x>0$) is now given by $$ \hat{f}(x) = \frac1{\sqrt{2\pi K''(\hat{s})}} \exp\left(K \hat{s} - \hat{s} x\right) $$ and the probability that $X=0$ is given (exactly) by $$ \hat{f}(0) = \exp(-\sum \lambda_i) $$ An implementation in R is below:

Saddlepoint approximation for a weighted sum of independent Poisson Random variables:

# Needs R 3.1.0 or newer (for extra argument of uniroot)
make_cumgenfun  <-  function(lambda, a) {
    # we return list(lambda, a, K, K',  K'')
    n  <- length(lambda)
    m  <- length(a)
    stopifnot( n==m, lambda>0,  a>0)
    return( list(lambda=lambda, a=a,
                  Vectorize(function(s) {sum(lambda * (exp(a*s)-1))} ),
                  Vectorize(function(s) {sum(lambda * a * exp(a*s))} ),
                  Vectorize(function(s) {sum(lambda * a* a *exp(a*s))} )))
}
# Probability that X=0:

P0 <- exp(-sum(lambda))

Functions to get expectation and variance of X:

Ef  <-  function(lambda, a) sum(lambda*a)
Vf  <-  function(lambda, a) sum(lambda*a*a)

solve_speq  <-  function(x, cumgenfun) {
    # Returns saddlepoint!
    lambda  <-  cumgenfun[[1]]
    a       <-  cumgenfun[[2]]
    Kd      <-  cumgenfun[[4]]
    uniroot(function(s) Kd(s)-x, lower=-100,  upper=+100,
            extendInt="yes")$root
}

# For an example,  define
set.seed(1234)
lambda <- 1:10
a      <- runif(10, 0.5, 3)
E  <- Ef(lambda, a)
V  <- Vf(lambda, a)


# Now,  a function giving the (uncorrected ) saddlepoint density. We include the special case for x=0
    make_fhat  <-  function(lambda, a) {
        cgf1  <-  make_cumgenfun(lambda, a)
        K  <- cgf1[[3]]
        Kd <- cgf1[[4]]
        Kdd <- cgf1[[5]]
        # function finding fhat for one specific x:
        fhat0  <- function(x) if (x==0) P0 else {
            # solve saddlepoint equation:
            s  <-  solve_speq(x, cgf1)
            # calculating saddlepointdensity value:
            (1/sqrt(2*pi*Kdd(s)))*exp(K(s)-s*x)}
        #Returning a vectorized version:
        return(Vectorize(fhat0))
    } # end make_fhat

and running this code in R:

> fhat  <-  make_fhat(lambda, a)  
> E
[1] 94.72556
> V
[1] 185.3017
> sqrt(V)
[1] 13.61256
> fhat(0)
[1] 1.299581e-24
> fhat(94)
[1] 0.02938575
> fhat(107)
[1] 0.01861648
> integrate(fhat, lower=0, upper=Inf)
1.001878 with absolute error < 3.6e-05
> 

The last integration can be used to correct fhat to get integral 1 (not shown here).

Finally, we can get a plot of the approximate density:

> plot(fhat,  from=60,  to=130)

enter image description here

Now, you can yourself compare this with a normal approximation and with simulations. It should be quite accurate!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Thanks for the answer and the code, it seems to work. However I hadn't heard about saddle point estimations before and therefore I don't fully understand how it works. You produce a generic distribution with the same cumulants? – B Fuchs Nov 25 '15 at 17:32
  • The saddlepoint-approximation needs the moment generating function to exist. And yes, it builds on the cumulants---all of them, not only the first two as the central limit theorem does. You should have a look at the referenced book by Butler! – kjetil b halvorsen Nov 25 '15 at 17:35
  • What I meant to say is that it would take me more time (some days - weeks) to read up on this than I can spare right now. Thanks for the reference anyway! Btw in my case a Gaussian approximation is quite close to the saddlepoint estimation (are the higher cumulants of a Poisson close to those of a Gaussian?) – B Fuchs Nov 26 '15 at 11:13
  • Nice. Perhaps that "How does the saddlepoint approximation work?" question is worth having on site. $\qquad$ \[In the end I [did](http://stats.stackexchange.com/questions/183313/bound-for-weighted-sum-of-poisson-random-variables#comment348601_183355) ask it.] – Glen_b Jan 20 '16 at 00:33