This question is in some sense the intersection of this question and this question. I have read up on the Gibbs sampler, and am now asking for an introduction to the Gibbs sampler for mathematicians. I like plausibility arguments and simulations as much as the next guy, but would like to know more about the math behind it.
I have read Casella/George and some other review articles, which all give nice heuristics and plausibility arguments for why the Gibbs sample should work, but I lack a mathematical argument. I tried to patch one up yesterday, but it appears to demand some knowledge I don't yet possess. I have the intuition, but I lack the mathematics with regards to the Gibbs sampler.
What is a good reference for a proof of the convergence of the Gibbs sampler for a nice class of distributions? I don't care much for rate of convergence (yet), although if that comes for essentially free, I'll be interested in that as well. :) Casella/George refer to Gelfand/Smith, who in turn refer to Geman/Geman. Geman/Geman write for image processing people and seem to defer some of the proofs to other sources, so it is not an ideal source. Hopefully someone has written a review article, book or lecture notes containing a nice proof since.
I know a "fair amount" of mathematics, so Lebesgue integrals, topology and other stuff don't scare me per se, although of course all else being equal I prefer simple proofs.