I am reading a paper, which talks about minimising KL-divergence of any arbitrary distribution over a family of exponential distribution. So, given a distribution $p$, we want to compute its approximation in the exponential family.
So, we have as follows where $p(x)$ is the given distribution and $q_{\theta}(x)$ is a distribution in the exponential family.:
$$ \begin{split} f(\theta) &= \textrm{KL} (p \| q_{\theta}) = \left\langle \log\left(\frac{p(x)}{q_{\theta}(x)}\right) \right\rangle_{p(x)} \\ &= \langle \log (p(x)) \rangle_{p(x)} + \langle \log(Z(\theta))\rangle_{p(x)} - \langle \theta ^{T} \phi(x) \rangle_{p(x)} \\ &= \langle \log (p(x)) \rangle_{p(x)} + \log(Z(\theta)) - \theta ^{T} \langle \phi(x) \rangle_{p(x)} \end{split} $$
In the last line of the steps above and the last term, we can take $\theta ^{T}$ out of the expectation operator and basically treat it as a constant, which confuses me. If I understand correctly, $\theta$ are the parameters of the approximating distribution, so mean and variance if the distribution is Gaussian, for example. So, how can this be treated as a constant considering that they also depend on $x$?