Suppose I have a probabilistic graphical model shown in the picture, in which all variables are binary, $c_1$ and $c_2$ are observed, and I want to use mean-field variational inference to estimate beliefs about the remaining variables. Suppose further - and this is crucial to the problem - that the two $b$'s are constrained to be identical to $a$: if $a$ is true, both $b$'s are true as well, and if $a$ is false, both $b$'s are also false. That is, $p(b_i|a)=[a=b]$.
The full posterior of the unobserved given the observed variables is given by $$ p(a,b_1,b_2|c_1,c_2) = \frac{p(c_1|b_1)p(c_2|b_2)p(b_1,b_2|a)p(a)}{p(c_1,c_2)} $$ We want to approximate this joint posterior with a factorized distribution $q(a,b_1,b_2)=q(a)q(b_1)q(b_2)$, by minimizing the KL-divergence: $$ D_{KL}(q||p)=E_q[\log q(a,b_1,b_2)-\log p(a,b_1,b_2|c_1,c_2)] $$ $$ =\sum_a q(a)\log q(a)+\sum_{b_1}q(b_1)\log q(b_1)+\sum_{b_2}\log q(b_2) -\sum_{b_1}q(b_1)\log p(c_1|b_1)-\sum_{b_2}q(b_2)\log p(c_2|b_2) - \sum_{a,b_1,b_2}q(a)q(b_1)q(b_2)\log p(b_1,b_2|a)- \sum_a q(a)\log p(a) + \log p(c_1,c_2) $$ This includes one term that is very problematic: $\log p(b_1,b_2|a)$. The probability inside the log equals either 1 or 0. It equals 1 when $a=b_1=b_2$ (i.e. when they are all true or all false), and 0 otherwise. The problem is that $\log 0 = -\infty$. Thus, if $q(a,b_1,b_2)$ assigns any probability mass to assignments for which $a$, $b_1$ and $b_2$ are not identical, the KL-divergence goes to infinity. Therefore, there are only two "legal" outcomes for variational inference here: we can either have $q(a=1)=q(b_1=1)=q(b_2=1)=1$, or $q(a=0)=q(b_1=0)=q(b_2=0)=1$.
This isn't very attractive, since the evidence provided by $c_1$ and $c_2$ might not be very informative about $b_1$ and $b_2$ (and, consequently, about $a$), and so really we would like our inferences to reflect this uncertainty, and assign some probability to both options (true or false) for $a$, $b_1$ and $b_2$. Variational inference, instead, will collapse onto the mode of the posterior and discard all this uncertainty, which is no better than doing MAP-inference.
My understanding is that this is quite a well-known issue with variational inference, but my question is: is there any solution or workaround? Is there a different way of stating or approaching the problem that allows us to make progress and preserve the uncertainty that we're interested in? Or is the only way to avoid it to use a different (approximate) inference algorithm (e.g. belief propagation)?