It all comes back to Bayes' Theorem: $P(A \mid B)=\frac{P(B \mid A) \cdot P(A)}{P(B)}$
Thing first thing we need to make sure we are on the same page about is this: In the Bayesian philosophy, parameters have a distribution. That mental shift alone might answer your question, as the distribution of the parameter is the main pursuit in Bayes. What is this equation saying? Given a supposed distribution of the parameter "A" (prior) and some data we observe, how does that change our knowledge of the left-hand side? If our data, conditional on the assumed distribution for A (i.e. $P(B|A)$) is unlikely if we assume a certain $P(A)$, then the posterior on the left-hand side will have a mean that is more like the mean of the data rather than the mean of the prior. If $P(B|A)$ is perfectly reasonable and likely using prior $P(A)$, you will find that your posterior will agree with your prior's mean and $P(A|B)=P(A)$. This is why people talk about Bayesian statistics as "updating" beliefs. PS: We usually ignore the bottom as a normalizing constant.
The magic moment for me on this matter is when I actually first derived the Bayesian estimate for a mean a few years ago. Suppose your data is normal and you have a normal prior. Let the subscript $0$ denote a parameter from the prior. I'll skip the derivation for the estimator, but the mean of the posterior turns out to be:
$\bar{y} \cdot \frac{\frac{n}{\sigma^{2}}}{\frac{n}{\sigma^{2}}+\frac{1}{\tau_{0}^{2}}}+\mu_{0} \cdot \frac{\frac{1}{\tau_{0}^{2}}}{\frac{n}{\sigma^{2}}+\frac{1}{\tau_{0}^{2}}}$
What does this look like?...A weighted average of the mean from the prior and the estimated mean of the data! If you prior is really strong, your posterior mean estimate will end up looking like your prior's mean. If your data is really different from your prior, the weight will be in the $\bar{y}$ term and your posterior's mean will look like your data's mean. What would drive the weight to be one way or the other? If $n$ is huge, then the prior mean's term on the RHS will shrink away to nothing. And isn't that just what we want? If we have a lot of new evidence, we want it to overshadow our prior.
If the prior's variance $\tau_0$ is small, that means we feel we know a lot about it. Notice that this will make the second term on the RHS large as compared to the data's influence.
All in all, Bayesian statistics is always a tug of war between the data and the assumed prior. How much data we have, or how strong our prior is, will influence our posterior. And remember, in regular old Frequentist statistics, we spend a lot of time figuring out the distribution of parameters, so hopefully the emphasis of finding the parameter's posterior does not feel out of place to you.