@gunes answered your question (+1), but it might be worth adding why you see maximizing the likelihood $P(D|\theta)$, rather then posterior $P(\theta|D)$ so often. Likelihood is a probability distribution that describes your data, parametrized by some parameter $\theta$. You can try different values of the parameter and find such distribution that "fits best" to the data
$$
\hat\theta_\text{MLE} = \underset{\theta}{\operatorname{arg\,max}} \; P(D|\theta)
$$
You cannot do the same for $P(\theta|D)$, because you didn't observe any $\theta$, so you cannot really tell that some value of $\theta$ has greater probability, then other. The data $D$ is fixed, so you cannot really check "what would happen if the data was different" as when maximizing the likelihood. Moreover, what would be the distribution $P$ in here? How would you choose the distribution that your parameter has? How would you know that the distribution fits $\theta$, as you didn't observe any $\theta$? There isn't really much that can be done in here to estimate this distribution directly.
However Thomas Bayes found one simple trick, Bayes theorem, which shows how given some likelihood, and a prior $P(\theta)$, we can "revert" the sides of conditional probability and obtain the posterior
$$
P(\theta|D) = \frac{P(D|\theta)\,P(\theta)}{P(D)} \propto P(D|\theta)\,P(\theta)
$$
then you can maximize
$$
\hat\theta_\text{MAP} = \underset{\theta}{\operatorname{arg\,max}} \; P(D|\theta)\,P(\theta)
$$
There is only one catch: you don't know the prior $P(\theta)$ either! The solution is that we assume some prior distribution, the one that is most reasonable given our best knowledge (or just a guess) and hope that the information in the data would overwhelm the prior. On another hand, in some cases when we have reasonable prior information, we can make up for having not enough data, by using priors. For more details check other questions tagged as bayesian.