I assume that there are n exponential distribution that $x_i$ ~ $Exp(\lambda_i)$, i=1..n, and I want to calculate the cumulative distribution of $S=x_1+x_2+...+x_n$, the convolution of n exponential distribution. I used the formulation that I read in a paper given by, $$1-\sum_{i=1}^n (\prod \limits_{j=1,j\neq i}^n \frac{\lambda_j}{\lambda_j-\lambda_i})e^{-\lambda_i t}$$ but when I set $\lambda_i = 0.01i+2$, and $n=20$, then when t=0, the result is not equal to 0.
-
2This site is not the place for debugging computer code--especially not nearly illegible and uncommented code of unspecified type. If you have a statistical question, please ask it. Maybe start by seeing if your equation works for the convolution of two exponential distributions, and ask about that. – BruceET Mar 04 '22 at 03:30
-
Actually I think there is no bug on me code, I just want to show the result calculated. – Ellen1230 Mar 04 '22 at 03:55
-
1As it says: "If the question is actually a statistical topic disguised as a coding question, then OP should edit the question to clarify this. After the statistical content has been clarified, the question is eligible for reopening." – BruceET Mar 04 '22 at 08:25
-
This is a partial fractions expansion, as described at https://stats.stackexchange.com/a/72486/919. It does not hold generally (for instance, it is constantly equal to $1$ when $n=1,$ which is incorrect; and it is undefined if any pair of the $\lambda_i$ are equal). When it does apply *theoretically*, the calculation is numerically unstable for small $t$ because it involves a great deal of near-cancellation of large terms. Thus, *this formula is unsuitable for general calculation.* You need a different algorithm. One attractive possibility is to invert the characteristic function. – whuber Mar 04 '22 at 14:08
-
1BTW, because questions about algorithms for computing quantities of statistical interest are on topic here, I have voted to reopen this thread. (+1) – whuber Mar 04 '22 at 14:09
-
@Ellen1230 could you supply a link to the original article. I can not directly make the connection with Whuber's answer elsewhere and when I add a factor (n-1)(n) to get it within the range of 0 and 1, then I still get large differences everywhere and not only when $t=0$. – Sextus Empiricus Mar 04 '22 at 15:23
-
@Sextus Here is a compact `R` implementation of the formula. `f – whuber Mar 04 '22 at 16:11
-
@whuber with $n=6$ I indeed get a CDF in the range from zero to one. But it is still a lot different from the simulations and alternative computation. (sidenote: I believe that the `margin` parameter of the `apply` function might need to be 2 instead of 1, but it doesn't matter for the issue, both cases are different) – Sextus Empiricus Mar 04 '22 at 17:31
-
@Sextus Yes, it's a lot different because your $\lambda$ appears to be the *reciprocal* of the $\lambda$ in the formula. I say "appears" because the formula in this question clearly isn't quite correct. For instance, it reduces to a constant $1$ when $n=1.$ I just don't feel like checking it ;-). – whuber Mar 04 '22 at 17:36
-
I found a reference here https://en.wikipedia.org/wiki/Hypoexponential_distribution but I do not see why it is different from my answer. There is no case of $\lambda_i = \lambda_j$. – Sextus Empiricus Mar 04 '22 at 18:29
1 Answers
You can alternatively model this as an order distribution. There is a link between sums of exponentials and order distributions of exponentials.
Since you are looking for the special case of $\lambda_i = 0.01i+2$ and $n=20$ this link works. The sum is similar to the sampling of $m = 220$ exponential distributions with $\lambda = 0.01$ and taking the 20-th smallest sample.
Then we can model the CDF as a beta distribution transformed by the exponential quantile function.
So if $X \sim Beta(20,220+1-20)$ then $Y = -100 \cdot \ln(1-X)$ is distributed as the sum of $20$ exponential distributed variables with $\lambda_i = 0.01i+2$
The code below demonstrates this:
set.seed(1)
### theoretic computation
k = 8
l = seq(2.01,2+0.01*k,0.01)
xs = seq(0,k,0.1)
pe = pexp(xs,0.01) # cdf of exponential
pb = pbeta(pe,k,200+k+1-k) # cdf of beta based on p from cdf of cdf of exponential
### theoretic computations 2
g <- Vectorize(function(t, lambda) { 1 - apply(outer(lambda, lambda, function(x,y) ifelse(x==y, 1, y/(y-x))), 1, prod) %*% exp(-lambda * t) }, "t")
### theoretic computations 2 (log values)
h = function(t) {
l = seq(2.01,2+0.01*k,0.01)
p = rep(0,k)
sgn = rep(0,k)
for (j in 1:k) {
p[j] = 0
sgn[j] = 1
for (i in 1:k) {
if (j != i) {
p[j] = p[j] + log(abs(l[i]/(l[j]-l[i])))
sgn[j] = sgn[j] * sign(l[i]/(l[j]-l[i]))
}
}
}
1-sum(sgn*exp(p+(-l*t)))
}
h = Vectorize(h)
### simulation of summing 20 exponential distributed variables
sim = function() {
l = seq(2.01,2+0.01*k,0.01)
sum(rexp(k,l))
}
n = 10^3
y = replicate(n, sim())
y = y[order(y)]
ps = c(1:n)/n
plot(y,ps, type = "l", ylim = c(0,1),
xlab = "y", ylab = "p(X<=x)", col = 1, lty = 1,
main = "comparing computed probability with simulation")
lines(xs, pb, lty = 2)
lines(xs, h(xs), col = 2, lty = 2)
lines(xs, g(xs, l), col = 3, lty = 2)
This code has your formula included as well (not plotted in the figure here). I used two versions. The function g
is based on Whuber's formula (it works for low odd numbers $n$). The function h
computes logarithms to make the computation more stable, but we still get a computation with large numbers. When $n = 20$ and we compute $P(X\leq 0)$ then we get the sum of numbers in the order around $10^{30}$:
> sgn*exp(p+(-l*t))
[1] -1.184378e+27 2.239177e+28 -2.005332e+29
[4] 1.130785e+30 -4.501074e+30 1.343767e+31
[7] -3.120310e+31 5.767002e+31 -8.609112e+31
[10] 1.047214e+32 -1.042251e+32 8.487285e+31
[13] -5.631626e+31 3.018244e+31 -1.287517e+31
[16] 4.271853e+30 -1.063042e+30 1.867351e+29
[19] -2.065360e+28 1.082091e+27

- 43,080
- 1
- 72
- 161