Let $\theta$ be a parameter and $\hat{\theta}$ be an estimator for $\theta$.
I understand that the MSE of $\hat{\theta}$ can be decomposed into its bias and variance. That makes sense. What I don't see is where there is a trade off. A trade-off would suggest that all estimators have the same MSE and you can only get less variance by incurring higher bias and vice versa.
Can you please explain the mathematics behind it?