sufficient statistic and KL-divergence: Confusion with an equation

Question

I am reading a paper, which talks about minimising KL-divergence of any arbitrary distribution over a family of exponential distribution. So, given a distribution $p$, we want to compute its approximation in the exponential family.

So, we have as follows where $p(x)$ is the given distribution and $q_{\theta}(x)$ is a distribution in the exponential family.:

$$ \begin{split} f(\theta) &= \textrm{KL} (p \| q_{\theta}) = \left\langle \log\left(\frac{p(x)}{q_{\theta}(x)}\right) \right\rangle_{p(x)} \\ &= \langle \log (p(x)) \rangle_{p(x)} + \langle \log(Z(\theta))\rangle_{p(x)} - \langle \theta ^{T} \phi(x) \rangle_{p(x)} \\ &= \langle \log (p(x)) \rangle_{p(x)} + \log(Z(\theta)) - \theta ^{T} \langle \phi(x) \rangle_{p(x)} \end{split} $$

In the last line of the steps above and the last term, we can take $\theta ^{T}$ out of the expectation operator and basically treat it as a constant, which confuses me. If I understand correctly, $\theta$ are the parameters of the approximating distribution, so mean and variance if the distribution is Gaussian, for example. So, how can this be treated as a constant considering that they also depend on $x$?

Parameters are *constants*: they do not depend on the data! Perhaps you might find a [discussion on likelihood and probability](http://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability) helpful. — whuber, May 23 '14 at 15:36
Thanks for that answer. Just to clarify, the parameters of the distribution $q$ determine the probability of observing data $x$ and even though we maximise the likelihood function to get the estimate of these parameters, these parameters do not depend on the data in any way. — Luca, May 23 '14 at 16:10
That's right. And in this case the parameters are removed from the data even further, because you have posited that the data follow the distribution $p$ *which has nothing at all to do with the parameters $\theta$!* — whuber, May 23 '14 at 16:14

sufficient statistic and KL-divergence: Confusion with an equation

0 Answers0