For those seeking an answer still in 2019:
Tempered EM has aims orthogonal to annealing methods. In annealing methods, the temperature scaling helps our optimization to find an optimimum faster/easier; in tempered EM, the temperature scaling provides a trade-off between overfitting and underfitting on the likelihood terms.
We can see the trade-off between under- and overfitting as follows:
The Helmholtz energy comprises of two factors
J(q) = beta * (expected negative log posterior) - (entropy) = beta * E_q [- \log p(x)] - H[q]
1) The expected negative log posterior wants an approximation, q, so that the data is fitted extremely well
2) The entropy wants an approximation, q, with high entropy. High entropy can be interpreted as lots of chaos, lots of uncertainty, broad typical sets.
Now tempered EM provides a trade-off between the two factors. For beta close to 1, the approximation can follow very well the likelihoods terms, but potentially overfitting. For beta close to 0, the approximation will have high entropy, not following the likelihood terms, but thereby potentially underfitting.