1

In the expectation propagation for the generative aspect model, Minka uses Taylor series for the parameter estimation of the topics $p(w\mid a)$ eq 31.

I am a little confused in the last equation. He expresses the expectation of a function in terms of Taylor expansion as follows (eq 40), $Var(\lambda)$ is the covariance matrix of $\lambda$:

\begin{equation} \mathbb{E}\left[f(\boldsymbol\lambda)\right] \approx f(\mathbb{E}\left[\boldsymbol\lambda\right]) + \frac{1}{2} Tr\left(f''(\mathbb{E}\left[\boldsymbol\lambda\right]) Var(\boldsymbol\lambda)\right) \end{equation}

However, in another post I found the following derivation for multivariate Taylor expansion:

\begin{equation} \mathbb{E}[f(\lambda)] \approx f(\mathbb{E}\lambda) + \frac{1}{2} \sum_{i=1}^n H_f(\mathbb{E}\lambda)_{ii} Var(\lambda_i). \end{equation}

The only difference is that in the first approximation Minka gets the product of the hessian and the covariance matrix inside the trace operation. This involves the interaction terms $Cov(\lambda_i,\lambda_j)$. However, Michał Stolarczyk in the stats exchange post gets the trace of the diagonal of the hessian and the diagonal of the covariance matrix; for instance no interaction terms.

Using the interactions terms of the covariance matrix, I get the expression (eq 33) by Minka in his paper:

\begin{equation} S_{ia} = \frac{\sum_bp(w\mid b)^2m_{iab}}{(\sum_bp(w\mid b)m_{iab})^2}-1 \end{equation}

However, using Michał's expression directs me to the following expression:

\begin{equation} S_{ia} = \frac{\sum_bp(w\mid b)^2m_{iab}-\sum_bp(w\mid b)^2m_{iab}^2}{(\sum_bp(w\mid b)m_{iab})^2} \end{equation}

Minka's result uses the interaction terms and the one shown comes from the following expression

\begin{equation} (\sum_bp(w\mid b)m_{iab})^2=\sum_bp(w\mid b)^2m_{iab}^2 + \sum_{k\neq j}p(w\mid b=i)p(w\mid b=j)m_{iak}m_{iaj} \end{equation}

However, Michał's derivation makes sense to me. So, I am confused about the expression of multivariate Taylor expansion for the moments of functions of random variables. Which one is correct or when should I use either one?

c.uent
  • 95
  • 7

1 Answers1

1

Michał Stolarczyk's answer started by deriving the same formula that I used, but then he simplified the formula to take advantage of the independence of variables in that specific question. When the variables are independent, the off-diagonal terms of the covariance matrix are zero.

You should only use his final formula when the off-diagonal terms of the covariance matrix are zero. Otherwise use his intermediate formula, which is the same as my formula.

Tom Minka
  • 6,610
  • 1
  • 22
  • 33
  • wow, thanks so much for your answer :) Now it makes more sense to me... At the time, I was using a Generalized Dirichlet distribution. PS: I am excited to see what you are working lately. I saw some of your lectures for automatic diff. – c.uent May 29 '20 at 17:57