11

Let's assume $N$ independent random variables $X_1, ..., X_N$ for which the quantiles at some specific level $\alpha$ are known through estimation from data: $\alpha = P(X_1 < q_1)$, ..., $\alpha = P(X_N < q_N)$. Now let's define the random variable $Z$ as the sum $Z = \sum_{i=1}^N X_i$. Is there a way to compute the value of the quantile of the sum at level $\alpha$, that is, $q_z$ in $\alpha = P(Z < q_Z)$?

I think that in particular cases, such as if $X_i$ follows a Gaussian distribution $\forall i$ this is easy, but I'm not so sure for the case where the distribution of the $X_i$ is unknown. Any ideas?

albarji
  • 213
  • 2
  • 6
  • 1
    are these $q_i$ estimated from data or theoretically known? – chuse Jan 26 '15 at 15:35
  • This is not possible without making specific assumptions about the distributions of the $X_i$. Do you have a family of distributions in mind? – whuber Jan 26 '15 at 17:53
  • @chuse the $q_i$ are estimated from data, as the distribution of the $X_i$ is not known but samples are available. I have updated the question with this fact. – albarji Jan 27 '15 at 08:51
  • @whuber I have no prior knowledge about the family of distributions the $X_i$ might be following, though data samples are available. Would assuming a family of distributions (aside from Gaussian) make this easier? – albarji Jan 27 '15 at 08:54

1 Answers1

6

$q_Z$ could be anything.


To understand this situation, let us make a preliminary simplification. By working with $Y_i = X_i - q_i$ we obtain a more uniform characterization

$$\alpha = \Pr(X_i \le q_i) = \Pr(Y_i \le 0).$$

That is, each $Y_i$ has the same probability of being negative. Because

$$W = \sum_i Y_i = \sum_i X_i - \sum_i q_i = Z - \sum_i q_i,$$

the defining equation for $q_Z$ is equivalent to

$$\alpha = \Pr(Z \le q_Z) = \Pr(Z - \sum_i q_i \le q_Z - \sum_i q_i) = \Pr(W \le q_W)$$

with $q_Z = q_W + \sum_i q_i$.


What are the possible values of $q_W$? Consider the case where the $Y_i$ all have the same distribution with all probability on two values, one of them negative ($y_{-}$) and the other one positive ($y_{+}$). The possible values of the sum $W$ are limited to $ky_{-} + (n-k)y_{+}$ for $k=0, 1, \ldots, n$. Each of these occurs with probability

$${\Pr}_W(ky_{-} + (n-k)y_{+}) = \binom{n}{k}\alpha^k(1-\alpha)^{n-k}.$$

The extremes can be found by

  1. Choosing $y_{-}$ and $y_{+}$ so that $y_{-} + (n-1)y_{+} \lt 0$; $y_{-}=-n$ and $y_{+}=1$ will accomplish this. This guarantees that $W$ will be negative except when all the $Y_i$ are positive. This chance equals $1 - (1-\alpha)^n$. It exceeds $\alpha$ when $n\gt 1$, implying the $\alpha$ quantile of $W$ must be strictly negative.

  2. Choosing $y_{-}$ and $y_{+}$ so that $(n-1) y_{-} + y_{+} \gt 0$; $y_{-}=-1$ and $y_{+}=n$ will accomplish this. This guarantees that $W$ will be negative only when all the $Y_i$ are negative. This chance equals $\alpha^n$. It is less than $\alpha$ when $n\gt 1$, implying the $\alpha$ quantile of $W$ must be strictly positive.

This shows that the $\alpha$ quantile of $W$ could be either negative or positive, but is not zero. What could its size be? It has to equal some integral linear combination of $y_{-}$ and $y_{+}$. Making both these values integers assures all the possible values of $W$ are integral. Upon scaling $y_{\pm}$ by an arbitrary positive number $s$, we can guarantee that all integral linear combinations of $y_{-}$ and $y_{+}$ are integral multiples of $s$. Since $q_W \ne 0$, it must be at least $s$ in size. Consequently, the possible values of $q_W$ (and whence of $q_Z$) are unlimited, no matter what $n\gt 1$ may equal.


The only way to derive any information about $q_Z$ would be to make specific and strong constraints on the distributions of the $X_i$, in order to prevent and limit the kind of unbalanced distributions used to derive this negative result.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Thanks a lot @whuber, for the explaining and the illustrative example. Even though the answer is negative, I can't say this was unexpected. Then I will try to find out which family of distributions suits my data and see if with that I can work out the quantiles of the sum. – albarji Jan 28 '15 at 07:49
  • What if the variables were 100% correlated instead? Under Gaussian law, the quantile of the sum would then be equal to the sum of quantiles - is this true in general or all laws, or for some family of laws (alpha-stable?), or is Gaussian an exception? Thank you – Confounded Oct 05 '20 at 12:16
  • @Confounded When all the variables are correlated, they are almost surely the *same* variable. The question becomes one of how to combine estimates of quantiles based on estimates from several samples. The interesting case concerns when those samples are independent. A great deal can be said about that even in very general cases (such as when no distributional assumptions are made) starting by generalizing methods to find [confidence intervals for the median](https://stats.stackexchange.com/questions/122001). – whuber Oct 05 '20 at 13:46