Sum of random variables that follow a finite normal mixture distribution

Question

Let $X_1,X_2,\dotsc,X_n$ be $n$ random variables, and $X_i, i=1,\dotsc,n$ has a density function as $f_i(x)=\lambda_{i1} g_1(x)+\dotsm+\lambda_{im} g_m(x)$, where $g_j, j=1,...m$ are density functions of normal distribution and $\sum_{j=1}^{m}\lambda _{ij}=1, \lambda_{ij}>0$, i.e. $X_i$ follows a finite normal mixture distribution.

Now, what is the density of $X_1+X_2+\dotsm+X_n$? Would it make the question simpler if we only consider $m=2$?

The distribution of the sum is a normal mixture with a number of components of order $m^n$ — Xi'an, May 13 '21 at 06:52

score 2 · Answer 1 · answered May 13 '21 at 00:39

I assume you meant that the $X_i$ are independent, so we have an iid sum $X_1+X_2+\dotsm+X_n$ where each summand has a normal mixture distribution with weights $\lambda_1, \dotsc, \lambda_m$ and components $\mathcal{Normal}(\mu_j, \sigma^2_j)$.

It might help to look at a very simple example to see the structure better, so $n=2, m=2$ with weights and component parameters $$ \lambda_1=0.95, \mu_1=0, \sigma^2_1=1 \\ \lambda_2=0.05, \mu_2=2, \sigma^2_2=4 $$ Then we have four possibilities for the sum, say, $$ X_{11}+X_{21} \\ X_{12}+X_{21} \\ X_{11}+X_{22} \\ X_{12}+X_{22} $$ but observe that the two medium lines will give rise to the same sum distribution, as both uses each mixture component once. So it arises a binomial structure, or in the general case with $m>2$ a multinomial structure. Let us describe that, so let $S_j$ be the number of times mixture component $j$ is chosen for the sum. Then $(S_1, \dotsc, S_m)$ will have a multinomial distribution with parameters $(n, \lambda_1, \dotsc, \lambda_m)$. The resulting sum distribution will be a finite normal mixture, with weights all the above multinomial probabilities, and components distributions $\mathcal{Normal}(\sum_{j=1}^m s_j \mu_j, \sum_{j=1}^m s_j \sigma^2_j)$. The number of components will grow rather fast ... so for many purposes approximations will help. One idea is to look into saddlepoint approximations, see How does saddlepoint approximation work?

score 2 · Answer 2 · answered May 13 '21 at 09:59

You can get the answer for this kind of problem fairly easily by writing your mixture random variables as sums of random variables, each multiplied by indicators for the outcomes of a categorical random variable. To do this, let $S_1,...,S_n \sim \text{IID Categorical}(\boldsymbol{\lambda})$ and write your mixture random variables as:

$$X_i = \sum_{s=1}^m G_{i,s} \cdot \mathbb{I}(S_i = s) \quad \quad \quad G_{i,s} \sim g_s.$$

Now, taking a sum of your mixture random variables gives:

$$\sum_{i=1}^n X_i = \sum_{i=1}^n \sum_{s=1}^m G_{i,s} \cdot \mathbb{I}(S_i = s) = \sum_{s=1}^m \Bigg( \sum_{i=1}^n G_{i,s} \cdot \mathbb{I}(S_i = s) \Bigg).$$

Each term in the brackets is a sum of $N_s \equiv \sum_{i=1}^n \mathbb{I}(S_i = s)$ IID random variables with density $g_s$. Noting that $\mathbf{N} \equiv (N_1,...,N_m) \sim \text{Mu}(n, \boldsymbol{\lambda})$ and letting $g_s^n$ denote the $n$-fold convolution of the density $g_s$, we can then write:

$$\sum_{i=1}^n X_i = \sum_{s=1}^m H_s(N_s) \quad \quad \quad H_s(n) \sim g_s^{n}.$$

So, we can see that the sum of the mixture random variables is itself a mixture of $m$ random variables, where each random variable $H_s$ is drawn from the $N_s$-fold convolution of $g_s$. In your case you are using the normal distribution, so you have $H_s(n) \sim \text{N}(n \mu_s, n \sigma_s^2)$. Since all weighted sums of normal random variables are normal random variances, all you need to do is to find the moments of the resulting sum variable. Using the law of iterated expectation and variance you have:

$$\begin{align} \mathbb{E} \Bigg( \sum_{i=1}^n X_i \Bigg) &= \mathbb{E} \Bigg( \mathbb{E} \Bigg( \sum_{i=1}^n X_i \Bigg| \mathbf{N} \Bigg) \Bigg) \\[6pt] &= \mathbb{E} \Bigg( \mathbb{E} \Bigg( \sum_{s=1}^m H_s(N_s) \Bigg| \mathbf{N} \Bigg) \Bigg) \\[6pt] &= \mathbb{E} \Bigg( \sum_{s=1}^m N_s \mu_s \Bigg) \\[6pt] &= \sum_{s=1}^m \mathbb{E} ( N_s ) \mu_s \\[6pt] &= \sum_{s=1}^m n \lambda_s \mu_s, \\[6pt] &= n \sum_{s=1}^m \lambda_s \mu_s, \\[6pt] \mathbb{V} \Bigg( \sum_{i=1}^n X_i \Bigg) &= \mathbb{V} \Bigg( \mathbb{E} \Bigg( \sum_{i=1}^n X_i \Bigg| \mathbf{N} \Bigg) \Bigg) + \mathbb{E} \Bigg( \mathbb{V} \Bigg( \sum_{i=1}^n X_i \Bigg| \mathbf{N} \Bigg) \Bigg) \\[6pt] &= \mathbb{V} \Bigg( \mathbb{E} \Bigg( \sum_{s=1}^m H_s(N_s) \Bigg| \mathbf{N} \Bigg) \Bigg) + \mathbb{E} \Bigg( \mathbb{V} \Bigg( \sum_{s=1}^m H_s(N_s) \Bigg| \mathbf{N} \Bigg) \Bigg) \\[6pt] &= \mathbb{V} \Bigg( \sum_{s=1}^m N_s \mu_s \Bigg) + \mathbb{E} \Bigg( \sum_{s=1}^m N_s \sigma_s^2 \Bigg) \\[6pt] &= \sum_{s=1}^m \mathbb{V}(N_s) \mu_s^2 + \sum_{s=1}^m \mathbb{E}(N_s) \sigma_s^2 \\[6pt] &= \sum_{s=1}^m n \lambda_s (1-\lambda_s) \mu_s^2 + \sum_{s=1}^m n \lambda_s \sigma_s^2 \\[6pt] &= n \sum_{s=1}^m \lambda_s [(1-\lambda_s)\mu_s^2 + \sigma_s^2]. \\[6pt] \end{align}$$

Thus, you have the final result:

$$\sum_{i=1}^n X_i \sim \text{N} \Bigg( n \sum_{s=1}^m \lambda_s \mu_s, n \sum_{s=1}^m \lambda_s [(1-\lambda_s)\mu_s^2 + \sigma_s^2] \Bigg).$$

Sum of random variables that follow a finite normal mixture distribution

2 Answers2