I believe Ariel Caticha has given some interesting insights on the interpretation of Maximum Entropy and its relation to Bayesian Inference.
As himself says, a good pedagogical review is his (unfinished) book, but one can check the papers coming out in arXiV as well.
I'll refer to some of the main ideas here in the hope that it helps answering the question (not sure about that, though, if the moderators think it's not going to the point I can delete it as well)
Cox, Jaynes, and many others have proved how probability is the fundamental theory for dealing with situations of incomplete information. If one assumes the proposed desiderata there can be no choice but to use (conditional) probabilities.
But even Jaynes used to say, as yourself has referred to, that updating probabilities through Bayes' rule or assigning probabilities using MaxEnt were entirely different things.
What Ariel did, building on the work of several other people (notably Skilling, Shore & Johnson; I'm probably missing others), was to prove that:
Maximum Entropy is a tool for updating probability distributions when discovering new information/data that constrains our knowledge about the inference we've been doing;
Maximum Entropy, as well as probabilities, also come from a set of desiderata, therefore one could not use another tool to update probabilites if one agrees with the impositions made in the beginning.
From that we can take 2 corollaries, which he also proves:
The process of assigning probabilities that Jaynes mentioned comes only from the choice of an uniform prior;
Maximum Entropy is the same as the Bayes' rule (therefore Bayesian inference, one could say) in the particular case that the new information comes in the form of data.
I guess this covers the MaxEnt $\leftrightarrow$ Bayesian link
I can't say much for the other one, MaxEnt $\leftrightarrow$ Maximum Likelihood, but I believe you have a point here that they connect somehow through Bayes' rule:
$$ p(x|\mathrm{data}) \propto p(\mathrm{data}|x) p(x) $$
If one makes a MAP (maximum a posteriori, usually considered a Bayesian method) estimate and takes an uniform prior $p(x)$, in fact what one is doing is maximizing the likelihood $p(\mathrm{data}|x)$. But I really don't have the experience to say more than that.