I am looking at the derivation of variational inference and specifically the approach taken by Bishop in his book on page 465 as illustrated in the Figure below. The key step is the statement below Equation 10.8 in which he says "... Thus maximizing (10.6) is equivalent to minimising the KL Divergence, ..."
However in the Errata document for the book that was created by Yousuke Takada the relevant clip is shown below in the Figure:
We see the statement "However, there is no point in taking the Kullback-Leibler divergence between two probability distributions over different sets of random variables; such a quantity is undefined."
So my questions are, is the statement made by Takada correct, and if so what is the correct derivation of the variational inference algorithm. Secondly, if the statement is incorrect, what is the expression for the KL-divergence of the form presented in Equation (10.6) and how does that form relate to the derivation of the variational inference algorithm?