How to derive the expectation of $\ln \mu_j$ in Dirichlet distribution

Question

I have derived the mean and variance of $\mu_j$ in Dirichlet distribution $\text{Dir}(\mu_1, \cdots, \mu_K|\alpha_1, \cdots, \alpha_k)$.

On https://en.wikipedia.org/wiki/Dirichlet_distribution, it also shows that

$$\mathbb{E}\left[\ln [\mu_j]\right] = \psi(\alpha_j) - \psi(\alpha_0)$$

where

$\alpha_0 = \sum_{k=1}^K \alpha_k$, and
$\psi(\alpha) = \frac{d}{d \alpha} \ln \Gamma(\alpha)$, the digamma function.

Can anyone provide hints or suggestion on how $\mathbb{E}\left[\ln \mu_j\right]$ can be derived, please?

The component $\mu_j$ is distributed as a Beta $B(\alpha_j,\alpha_0-\alpha_j)$. — Xi'an, Aug 17 '20 at 07:10
Using the Beta distribution of $\mu_j$, see https://stats.stackexchange.com/q/241993/119261. Another possible way is to use facts of exponential family as done here: https://stats.stackexchange.com/a/371031/119261. — StubbornAtom, Aug 17 '20 at 07:22

zyxue · Accepted Answer · 2020-08-19T04:54:59.730

\begin{align} \mathbb{E}[\ln \mu_j] &= \int_0^1 \ln \mu_j \text{Dir}(\boldsymbol{\mu}|\boldsymbol{\alpha}) d\mu_j \\ &= \int_0^1 \ln \mu_j \text{Beta}(\alpha_j, \alpha_0 - \alpha_j) d\mu_j \\ &= \int_0^1 \ln \mu_j \frac{1}{\text{B}(\alpha_j, \alpha_0 - \alpha_j)} \mu_j^{\alpha_j - 1} (1 - \mu_j)^{\alpha_0 - \alpha_j - 1} d\mu_j \\ &= \frac{1}{\text{B}(\alpha_j, \alpha_0 - \alpha_j)} \int_0^1 \frac{d \mu_j^{\alpha_j - 1}}{d \alpha_j}(1 - \mu_j)^{\alpha_0 - \alpha_j - 1} d\mu_j \\ &= \frac{1}{\text{B}(\alpha_j, \alpha_0 - \alpha_j)} \frac{d}{d \alpha_j} \int_0^1 \mu_j^{\alpha_j - 1} (1 - \mu_j)^{\alpha_0 - \alpha_j - 1} d\mu_j \\ &= \frac{1}{\text{B}(\alpha_j, \alpha_0 - \alpha_j)} \frac{d \text{B}(\alpha_j, \alpha_0 - \alpha_j)}{d \alpha_j} \\ &= \frac{d}{\alpha_j} \ln \text{B}(\alpha_j, \alpha_0 - \alpha_j) \\ &= \frac{d}{\alpha_j} \ln \frac{\Gamma(\alpha_j) \Gamma(\alpha_0 - \alpha_j)} {\Gamma(\alpha_0)} \\ &= \frac{d}{\alpha_j} \bigg( \ln \Gamma(\alpha_j) + \ln \Gamma(\alpha_0 - \alpha_j) - \ln \Gamma(\alpha_0) \bigg ) \\ &= \frac{d}{\alpha_j} \ln \Gamma(\alpha_j) - \frac{d}{\alpha_j}\ln \Gamma(\alpha_0) \\ &= \frac{d}{\alpha_j} \ln \Gamma(\alpha_j) - \frac{d}{\alpha_0}\ln \Gamma(\alpha_0) \\ &= \psi(\alpha_j) - \psi(\alpha_0) \end{align}

Note:

In the 4th equality, we used the fact $\frac{d}{dx} a^x = a^x \ln a$ as shown below.
In the 4th last and 2nd last equalities, when taking derivative wrt. $\alpha_j$, $\alpha_0 - \alpha_j$ is considered a constant, $\alpha_0$ is NOT a constant, so
- $\frac{d}{\alpha_j} \ln(\alpha_0 - \alpha_j) = 0$ (4th last equality), and
- $\frac{d}{\alpha_j}\ln \Gamma(\alpha_0) = \frac{d}{\alpha_j + (\alpha_0 - \alpha_j)}\ln \Gamma(\alpha_0) = \frac{d}{\alpha_0}\ln \Gamma(\alpha_0)$ (2nd last equality).
$\psi(x) \equiv \frac{d}{dx} \ln \Gamma(x) $ is called the digamma function.

chengxiz · Answer 2 · 2021-08-23T08:46:09.747

The answer from @zxyue is fantastic.

However, it omitted the process how the 1st equation $\mathbb{E}[\ln\mu_j]=\int_0^1 \ln \mu_j \text{Dir}(\boldsymbol{\mu}|\boldsymbol{\alpha}) d\mu_j$ is achieved. This answer is an supplementary answer to @zxyue's answer.

According to the definition of Expectation: $\mathbb{E}[X]=\int_\mathbb{R}xf(x)dx$. Then $$ \begin{align} \mathbb{E}[\ln\mu_j] &=\int\ln\mu_jf(\boldsymbol{\mu})d\boldsymbol{\mu}\\ &=\int\dots\int\ln\mu_jf(\boldsymbol{\mu})d\mu_1\dots d\mu_K\\ &=\int\dots\int\int\int\dots\int\ln\mu_jf(\boldsymbol{\mu})d\mu_1\dots d\mu_{j-1}d\mu_jd\mu_{j+1}\dots d\mu_K \end{align} $$ Move integral $\int\dots d\mu_j$ to the most outside and set its interval as [0,1] $$ \mathbb{E}[\ln\mu_j] =\int_0^1\int_0^{1-\sum_{k=1}^{\mathbb{K}-1}{\mu_k}}\dots\int_0^{1-\sum_{k=1}^{j}{\mu_k}}\int_0^{1-\mu_j-\sum_{k=1}^{j-2}\mu_k}\dots\int_0^{1-\mu_j}\ln\mu_jf(\boldsymbol{\mu})d\mu_1\dots d\mu_{j-1}d\mu_{j+1}\dots d\mu_Kd\mu_j $$ Move $ln\mu_j$ out of the core integral then to the outermost integral, since $ln\mu_j$ is independent of $\mu_1\dots\mu_{j-1},\mu_{j+1}\dots\mu_K$ thus could be treated as a constant $$ \begin{align} \mathbb{E}[\ln\mu_j] &=\int_0^1\ln\mu_j\int_0^{1-\sum_{k=1}^{\mathbb{K}-1}{\mu_k}}\dots\int_0^{1-\sum_{k=1}^{j}{\mu_k}}\int_0^{1-\mu_j-\sum_{k=1}^{j-2}\mu_k}\dots\int_0^{1-\mu_j}f(\boldsymbol{\mu})d\mu_1\dots d\mu_{j-1}d\mu_{j+1}\dots d\mu_Kd\mu_j\\ &=\int_0^1\ln\mu_jg(\boldsymbol{\mu})d\mu_j \end{align} $$ Where $g(\boldsymbol{\mu})$ is $\int_0^{1-\sum_{k=1}^{\mathbb{K}-1}{\mu_k}}\dots\int_0^{1-\sum_{k=1}^{j}{\mu_k}}\int_0^{1-\mu_j-\sum_{k=1}^{j-2}\mu_k}\dots\int_0^{1-\mu_j}f(\boldsymbol{\mu})d\mu_1\dots d\mu_{j-1}d\mu_{j+1}\dots d\mu_K$

By observation, we could easily realize that $g(\boldsymbol{\mu})$ is the marginal distribution of variable ${M_j}$. That is to say $$ \begin{align} g(\boldsymbol{\mu})&=\int_0^{1-\sum_{k=1}^{\mathbb{K}-1}{\mu_k}}\dots\int_0^{1-\sum_{k=1}^{j}{\mu_k}}\int_0^{1-\mu_j-\sum_{k=1}^{j-2}\mu_k}\dots\int_0^{1-\mu_j}f(\boldsymbol{\mu})d\mu_1\dots d\mu_{j-1}d\mu_{j+1}\dots d\mu_K\\ &=f_{M_j}(\mu_j) \end{align} $$ As we known, $\boldsymbol{\mu}$ follows Dirichlet Distribution $\text{Dir}(\boldsymbol{\mu}|\boldsymbol{\alpha})$. Dirichlet Distribution is a multivariate version of Beta Distribution. Intuitively, we could get that $f_{M_j}(\mu_j)$ is the probability density function of Beta Distribution $\text{Beta}(\alpha_j, \alpha_0 - \alpha_j)$, where $\alpha_0=\sum_{j=1}^{K}\alpha_j$. Then we could write $\mathbb{E}[\ln\mu_j]$ as follows $$ \begin{align} \mathbb{E}[\ln\mu_j] &=\int\ln\mu_jf(\boldsymbol{\mu})d\boldsymbol{\mu}\\ &= \int_0^1\ln\mu_jg(\boldsymbol{\mu})d\mu_j\\ &= \int_0^1\ln\mu_jf_{M_j}(\mu_j)d\mu_j\\ &= \int_0^1 \ln \mu_j \text{Beta}(\alpha_j, \alpha_0 - \alpha_j) d\mu_j \end{align} $$ Which is now consistent with the 2nd equation from @zxyue's answer.

For a formal derivation of the marginal distribution of Dirichlet distribution, please refer the answer from question Find marginal distribution of -variate Dirichlet

Why is there anything to show? The first equation is usually taken as a *definition* of expectation. When it's not, it is justified by [LOTUS](https://stats.stackexchange.com/search?q=LOTUS) — whuber, Aug 19 '21 at 16:08
If it is the definition of expectation and if it is with respect to its joint probability Dirichlet distribution ( as shown in the first equation), then the differential of the variable should be vector $d\boldsymbol{\mu}$ rather than $d\mu_j$ — chengxiz, Aug 20 '21 at 06:54
That is correct--but one rarely bothers to observe this, because the first step is to integrate over all the other variables, thereby reducing the problem to one involving the marginal distribution. Thus, your answer could be reduced to its last line, which is a link. — whuber, Aug 20 '21 at 11:02
I am happy we agreed that your previous comment was wrong. And, even I spent so many words for my answer to articulate how this marginal distribution came, you, as an experienced teacher, still failed to observe that at first glance. That’s exactly why I would like to insist my style rather than reduce it to one link. But you conclusion is correct this time anyway. — chengxiz, Aug 20 '21 at 11:32
We made no such agreement. One thing I, as a teacher and expositor, look for in good answers is *clarity.* Excessive and unnecessary detail usually detract from that. — whuber, Aug 20 '21 at 11:34
Yes we did. You admitted your first comment was wrong which I pointed it out. That’s called an agreement. The fact that first equation is mathematically WRONG, is objectively discussible. That’s the reason why I wrote my answer. your personal taste of clarity, Why should I care? — chengxiz, Aug 20 '21 at 15:13
When somebody tells you explicitly they disagree with your representation of their own words, then insisting that they are wrong is neither a constructive way to continue the conversation nor is it in good faith. My statements are here for the record and I see no need to continue to elaborate on them further. — whuber, Aug 20 '21 at 15:53
Great illustration of Avoiding the important and dwelling on the trivial — chengxiz, Aug 20 '21 at 17:08
I agree with @whuber that the first equation is taken from the definition of expectation. To get the expectation of $\ln \mu_j$, all other $\mu_k$s in $\text{Dir}(\boldsymbol{\mu}|\boldsymbol{\alpha})$ are treated as constants. — zyxue, Aug 21 '21 at 19:55
I have explicitly clarified that by the definition of expectation the probability and the derivative variable should be consistent. Dirichlet Distribution is a multivariate distribution by definition. — chengxiz, Aug 22 '21 at 03:13

How to derive the expectation of $\ln \mu_j$ in Dirichlet distribution

2 Answers2