While estimation per se is aimed at coming up with values of the unknown parameters (e.g., coefficients in logistic regression, or in the separating hyperplane in support vector machines), statistical inference attempts to attach a measure of uncertainty and/or a probability statement to the values of parameters (standard errors and confidence intervals). If the model that the statistician assumes is approximately correct, then provided that the new incoming data continue to conform to that model, the uncertainty statements may have some truth in them, and provide a measure of how often you will be making mistakes in using the model to make your decisions.
The sources of the probability statements are twofold. Sometimes, one can assume an underlying probability distribution of whatever you are measuring, and with some mathematical witchcraft (multivariate integration of a Gaussian distribution, etc.), obtain the probability distribution of the result (the sample mean of the Gaussian data is itself Gaussian). Conjugate priors in Bayesian statistics fall into that witchcraft category. Other times, one has to rely on the asymptotic (large sample) results which state that in large enough sample, things are bound to behave in a certain way (the Central Limit Theorem: the sample mean of the data that are i.i.d. with mean $\mu$ and variance $\sigma^2$ is approximately Gaussian with mean $\mu$ and variance $\sigma^2/n$ regardless of the shape of the distribution of the original data).
The closest that machine learning gets to that is cross-validation when the sample is split into the training and the validation parts, with the latter effectively saying, "if the new data looks like the old data, but is entirely unrelated to the data that was used in setting up my model, then a realistic measure of the error rate is such and such". It is derived fully empirically by running the same model on the data, rather than trying to infer the properties of the model by making statistical assumptions and involving any mathematical results like the above CLT. Arguably, this is more honest, but as it uses less information, and hence requires larger sample sizes. Also, it implicitly assumes that the process does not change, and there is no structure in the data (like cluster or time-series correlations) that could creep in and break the very important assumption of independence between the training and the validation data.
While the phrase "inferring the posterior" may be making sense (I am not a Bayesian, I can't really tell what the accepted terminology is), I don't think there is much involved in making any assumptions in that inferential step. All of the Bayesian assumptions are (1) in the prior and (2) in the assumed model, and once they are set up, the posterior follows automatically (at least in theory via Bayes theorem; the practical steps may be helluvalot complicated, and Sipps Gambling... excuse me, Gibbs sampling may be a relatively easy component of getting to that posterior). If "inferring the posterior" refers to (1) + (2), then it is a flavor of statistical inference to me. If (1) and (2) are stated separately, and then "inferring the posterior" is something else, then I don't quite see what that something else might be on top of Bayes theorem.