What happens when merging random variables in Dirichlet distribution?

Question

Imagine that

$$ X_1,\dots,X_k \sim \mathrm{Dirichlet}(\alpha_1,\dots,\alpha_k) $$

Since $x_i \in (0,1)$ for all $x_i$ and $\sum_{i=1}^k x_i = 1$, then $x_i$'s follow the first two axioms of probability and Dirichlet can be (and is) used as "distribution over distributions". Intuitively it should follow that

$$ X_1,\dots,X_{k-2},X_{k-1}+X_k \sim \mathrm{Dirichlet}(\alpha_1,\dots,\alpha_{k-2}, \alpha_{k-1}+\alpha_k) $$

since the properties of $x_i$'s would not change and the total "mass" of $\alpha_i$'s would not change.

But it's probability density function is

$$ f(x_1,\dots,x_k) \propto \prod_{i=1}^k x_i^{\alpha_i - 1}$$

and

$$ x_{k-1}^{\alpha_{k-1} - 1} \times x_k^{\alpha_k - 1} \ne (x_{k-1} + x_k)^{\alpha_{k-1} + \alpha_k - 1} $$

So merging of random variables in Dirichlet distribution does not seem lead to Dirichlet distribution over $k-1$ variables. What does it lead to?

Wouldn't the inequality have to be $x_{k-1}^{\alpha_{k-1}-1} \mathrm{x} x_k^{\alpha_k-1} \neq (x_{k-1}+x_k)^{\alpha_{k-1}+\alpha_k - 2}$? — JAD, Nov 09 '16 at 09:24
Actually, I think the exponent should be only ${}-1$, as if it were plugging $\alpha_{k-1} + \alpha_k$ into the distribution as one parameter. — JAD, Nov 09 '16 at 09:28
One more bit of nitpicking: You turned the exponent into $\alpha_{k-1}-\alpha_k - 1$, instead of $\alpha_{k-1}+\alpha_k - 1$. — JAD, Nov 09 '16 at 09:42
@JarkoDubbeldam it seems I should have had a coffee before posting this question :) — Tim, Nov 09 '16 at 09:44
I am intrigued by this question, but I cannot reproduce the problem. That is to say, I tried simulating the data in R by summing two variables and drawing from the distribution with the sum of their $\alpha$ and they turned out to be similarly distributed. I can share the code if you want and would take this to chat, but I don't know how. — JAD, Nov 09 '16 at 16:29

score 8 · Accepted Answer · answered Nov 09 '16 at 20:38

It is a Dirichlet distribution having the expected parameters.

To see this, note that the vector-valued random variable $\mathbf{X}=(X_1, X_2, \ldots, X_k)$ has the same distribution as the variable

$$\frac{1}{\sum_i^k Y_i}\left(Y_1, Y_2, \ldots, Y_k\right)$$

where $Y_i \sim \Gamma(\alpha_i)$ are independently Gamma distributed. Write $Y_i^\prime=Y_i$ for $i=1, 2, \ldots, k-2$ and $Y_{k-1}^\prime = Y_{k-1}+Y_k$. The sum of all the $Y_i$ equals the sum of all the $Y_i^\prime$ and the distribution of $Y_{k-1}^\prime=Y_{k-1}+Y_k$ is $\Gamma(\alpha_{k-1}+ \alpha_k)$. Thus

$$X_{k-1} + X_k = \frac{1}{\sum_i^k Y_i} Y_{k-1} + \frac{1}{\sum_i^k Y_i} Y_{k} = \frac{1}{\sum_i^{k-1} Y_i^\prime} Y_{k-1}^\prime$$

and, for $i < k-1$,

$$X_i = \frac{1}{\sum_i^k Y_i} Y_{k-1} = \frac{1}{\sum_i^{k-1} Y_i^\prime} Y_{k-1}^\prime.$$

Therefore $\mathbf{X}^\prime=(X_1, X_2, \ldots, X_{k-2}, X_{k-1}+X_k)$ has the same distribution as

$$\frac{1}{\sum_i^{k-1} Y_i^\prime}\left(Y_1^\prime, Y_2^\prime, \ldots, Y_k^\prime\right).$$

This demonstrates that $\mathbf{X}^\prime$ has a Dirichlet$(\alpha_1, \alpha_2, \ldots, \alpha_{k-2}, \alpha_{k-1}+\alpha_k)$ distribution, QED.

The fault in the argument in the question lies in confusing the arithmetic sum of values $x_{k-1}+x_k$ with the sum of random variables $X_{k-1}+X_k$. The latter is performed with a convolution, of course.

Thanks. While doing some search on it I actually found [your answer about sum of gamma distributions](http://stats.stackexchange.com/a/72486/35989) but somehow missed the very first sentence mentioning the facts in your answer above. — Tim, Nov 09 '16 at 21:07

JAD · Answer 2 · 2016-11-09T20:55:42.717

Like I mentioned in the comments, all I am trying suggests that what you are suggesting actually does work.

As you mentioned, intuitively it makes sense if this would work, if $X_i$ represents the posterior draw for some probability $p_i$ for event $i$ happening, you should indeed be able to sum multiple $X_i$ to get the probability of multiple events $i$, if there is no possibility of both events happening at the same time. Since this is a multinomial setting, this is not the case, so we're good.

So let's show my simulations:

library('gtools')

K <- 10 
alpha <- c(rpois(K, 50)) #randomly generated alphas, just cause
k <- 2 # the number of alphas we are summing together

sim <- rdirichlet(10000, alpha)
plot(density(rowSums(sim[, 1:k]))) # the density of the summed variable

lines(density(rdirichlet(10000, c(sum(alpha[1:k]), alpha[-(1:k)]))[,1]), col = 'blue') 
# the density of the variable drawn from the Dirichlet distribution with summed alphas

Let's start with $\alpha = \{10, 10, 10\}$. Summing $\alpha_1$ and $\alpha_2$ should get us $Dir(2, \{20, 10\})$:

These marginal densities look pretty similar. According to wikipedia and this random lecture I found on the internet (through wikipedia), the marginal distribution of $X_i$ to the Dirichlet distribution is as follows:

$$X_i = Beta(\alpha_i, \sum_{k=1}^K\left[\alpha_k\right] - \alpha_i)$$

This relies on actually the same principle: summing all the $\alpha$ that are not $\alpha_i$ together, turning the corresponding multinomial to a binomial distributions with the outcomes $i$ and $not\textrm{-}i$. And indeed, if we fit the marginal distribution we would expect from the sum over the density in the previous picture, we see that it looks the same:

So theoretically, we should be able to take a Dirichlet distribution with a high $K$, sum all but one together and end up with a Beta distribution. Heck, let's try:

(99 times $\alpha \sim Pois(50)$ and 1 $\alpha = 1000$, summing together the random $\alpha\mathrm{s}$.)

To show that it also works for the joint densities, this is an example with $K=4$ and $\alpha = \{10,10,10,10\}$:

And this is with $K=3$ and $\alpha=\{20,10,10\}$:

$$\ddot{\smile}$$

So where does the confusion about $x_1^{\alpha_1-1}x_2^{\alpha_2-1}\neq (x_1 + x_2)^{\alpha_1+\alpha_2 -1 }$ come from?

Because when we sum $x_1$ and $x_2$ together, we don't just care about the density at $x_1$ and $x_2$, but for any combination of two $x$ that sums to $x_1+x_2$. I am not that strong in integration, so I'll not try and burn myself with that, but I really suggest reading this (page 3-4) for more information.

EDIT:

As @whuber correctly remarked, here is an example with low alphas, $K=4$, summing the first two $X_1$ and $X_2$:

This is a nice idea for testing the situation. However, because the Dirichlet marginals are very close to Normal for large $\alpha_i$, choosing large values is not much of a test. You ought to have carried it out with *small* values of the $\alpha_i$. Values less than $1$ are especially interesting due to the extreme skewness of their marginals. — whuber, Nov 09 '16 at 20:41
@whuber you're right. Your answer has a better explanation for why this does work (+1). I added an example for a lower alpha, for completeness. — JAD, Nov 09 '16 at 20:56

What happens when merging random variables in Dirichlet distribution?

2 Answers2

Linked

Related