Consider the problem of the choice of estimator of $\sigma^2$ based on a random sample of size $n$ from a $N(\mu,\sigma^2)$ distribution.
In undergraduate, we were always taught to use the sample variance
$$\hat{s}^2 = \dfrac{1}{n-1}\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}$$
instead of the maximum likelihood estimator
$$\hat{\sigma}^2 = \dfrac{1}{n}\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}.$$
This is because we learned that $\hat{s}^2$ is an unbiased estimator and that $\hat{\sigma}^2$ is biased.
However now I'm studying for a PhD and I've read that we choose estimators based on minimizing mean square error (=bias$^2$ + var).
It can be shown that $$mse(\hat{\sigma}^2) < mse(\hat{s}^2 ).$$
So, why do most people use $\hat{s}^2$?