First, the method and theory, in brief: The goal is to approximate the target distribution $p(\theta|D)$ where $\theta$ is a vector parameter and $D$ is observed data, given some prior distribution $p(\theta)$. At each stage of the MCMC chain, the sampling algorithm proposes a new parameter vector $\theta$. (This process varies depending on the flavor of algorithm, and the proposal distribution.) Given a proposed $\theta$, it then computes the product $p(D|\theta_{proposed})p(\theta_{proposed})$, which by Bayes rule is proportional to the posterior distribution $p(\theta|D)$. It accepts the proposal with probability $max(\frac{p(\theta_{proposed})}{p(\theta_{current})},1)$. If a number of requirements are met, this chain will produce a representative sample from the posterior distribution. (In brief, it requires a proposal process that adequately covers the posterior distribution, proper burn-in, and convergence.)
If those requirements are met, one can view the MCMC sample as an approximation to the posterior. Each individual sample value is one sampled vector of values for $\theta$; likewise, differencing two sampled parameters over the entire sample produces an approximated distribution of the difference between the two parameters. (I'm not familiar with MCMCPack, but I gather from your code and comment that postDist[,"y2"]
and postDist[,"y2"]
are vectors of samples from the posterior, so this should work.) This is one benefit of MCMC methods: If the parameters covary, then solving for their sum or difference analytically depends on knowing their joint distribution.
By the by, I began learning Bayesian methods with Kruschke's Doing Bayesian Data Analysis, and I highly recommend his chapters explaining MCMC algorithms. It's a very approachable, intuitive treatment.