I would like to wax philosophically on @kjetil's answer, and specifically this statement:
That approximation will forget about the uncertainty in the estimation of θ, and might be a good approximation is some cases and bad in others. That must be evaluated on a case-by-case basis.
The reason that we can use the MLE to good effect is because of the following two things:
- The real world is sane.
- We know that ``extraordinary claims require extraordinary evidence'' -Carl Sagan
The world is sane
What I mean by the first point is that if you make up an arbitrary problem, out of `all possible problems' in some sense, then the MLE is likely to be a terrible estimate. However, if you choose a real problem out of the set of problems that one might legitimately encounter in the real world, then the MLE works reasonably well because $P(\theta)$ is not unreasonable.
To illustrate, consider that we would like to estimate $\theta$, the probability of heads of some coin-of-unknown-fairness. Now, in order to even compute the Bayesian version, before we can start computing probabilities with respect to a dataset $X$ we first need to contemplate the world of possible coins. This world of coins in which we found our coin is essentially $P(\theta)$, our prior probability.
Ordinarily, this world is easy to contemplate, because we would have a real world coin that we need to estimate, and we live in the real world. However, in a non-real world, who knows what manner of strange and magical coins there be? In a particular weird and magical world, we might have the following prior:
$$P(\theta) = \begin{cases}
0 & \theta \in A \\
m(I)/m(A) & \theta \in I - A \\
0 & else
\end{cases}$$
Where $m$ is the Lebesgue measure, $I$ is the unit interval, and $A$ is a set constructed with this clever method by Rudin.
We get some very strange behavior from this situation. Notably, there is an $m(A)$ chance that our MLE of $\theta$ is impossible. If we construct $A$ so that $m(A)$ is very close to 1, then the MLE of $\theta$ is almost certainly going to be bad in the sense that it will be impossible.
However, we don't live in this weird world. We live in the real world. Generally, when we pick up a coin, a prior for heads that is heavily weighted near $50\%$ is not unreasonable. At the very least, a continuous prior is almost certainly a good assumption. There is no mathematical necessity that our prior be continuous everywhere or anywhere, but we live in the real world, and the real world is a very special world out of the set of all mathematically feasible worlds. If $\theta_1$ is close to $\theta_2$ in the real world, then we anticipate that $\theta_1$ is nearly as likely as $\theta_2$ to be the correct proportion of heads. The fact that our world is a sane world is very convenient for scientists, who depend on this in order to estimate e.g. the likelihood that some coin will turn up heads. In short, priors in our world tend to be well-behaved, and this constraint along with the constraint discussed in the next section, means that the MLE is generally a likely one in our posterior distribution.
Extraordinary claims require extraordinary evidence
To illustrate this, consider Fisher's tea tasting lady. The tea tasting lady claims that she has skill at determining whether the tea or milk has been poured into the cup first. To test this, we design an experiment in which we randomize the order in which tea and milk are added to some cups of tea, and then we decide to choose the percent difference in the fraction of times she was correct and 0.5 (random guessing) as the MLE of her relative skill at tea tasting. If we pour 5 cups of tea for her to taste, then we are guaranteed to measure at least a 20% tea tasting skill, and it is not unlikely that we measure a 60% or 100% tea tasting skill.
However, we reflect briefly upon this experiment that we have designed, and it is clear that this is a terrible experiment. This is because we a priori judge this lady's claim to be nuts... there's just no way she can tell whether we poured the tea or the milk into the cup first. In other words, our prior is extremely skewed in this situation, so that our MLE is not very good in the sense that it is improbable given our prior.
As good scientists, however, we were not fooled by this, because we know that extraordinary claims require extraordinary evidence. If this lady really, really, for realsies can taste whether or not the tea was first, we need her to taste not only 5 cups, but 5000 cups! Of course, as the amount of evidence grows, the evidence overwhelms our skewed prior, and the MLE approaches the Bayesian estimate.
To sum up
In conclusion, since our world is sane, and since good scientists realize that extraordinary claims require extraordinary evidence, then generally when we compute a maximum likelihood estimate (and are inclined to take it seriously), it is not far from the maximum posterior estimate. This is because priors for problems that we test are generally very boring. They're not extremely skewed, they are continuous, and mostly differentiable, and don't tend to conflict with reality to any large degree. Thus, our MLE is usually quite likely under our prior. If the value is likely in the prior, and the evidence also supports the value, then it will be very likely in the posterior. Thus, the MLE and the MAP estimates tend not to be so different in real world problems. Of course, there is no guarantee that this is the case, but it is a convenient property of the sane world in which we live.