I am trying to understand the following claim which is made in the Deep learning book by Goodfellow et. al about a toy energy-based model (with the apparent motivation of introducing Markov Chain Monte Carlo methods):
To understand why drawing samples from an energy-based model (EBM) is difficult, consider the EBM over just two variables, defining a distribution $p(a,b)$. In order to sample $a$, we must draw from $p(a|b)$, and in order to sample $b$, we must draw it from $p(b|a)$. It seems to be an intractable chicken-and-egg problem.
This sounds strange to me as I don't understand what prevents us from computing one of the marginals, say, $p(a)$ by marginalising over $b$, sampling from $p(a)$ and then sampling from $p(b|a)$. I don't really see why we have to deal with the chicken-an-egg problem that the author mentions. Why is my reasoning wrong?