One guess is that $p_{\text{data}}$ means the empirical distribution. It hardly matters, because the expectation of the constant $-\log 2$ is $-\log 2$, the distribution do not matter!
– kjetil b halvorsenSep 20 '17 at 19:30
In the description of the EM algorithm on Wikipedia, people use to do the same stupid things: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm. Here they write $E_{Z|X}[f(Z,X)]$ and they actually mean is the (factorization of the) conditional expectation $E[f(Z,X)|X=x]$ and I assume that they want to express something similar here... maybe it is $E[\text{log}p_{\theta}(X|Z) | X=x]$ or maybe restricted to $Z=z$?
– Fabian WernerSep 21 '17 at 07:41