Given a linear model $\mathbf{y} = \beta \mathbf{X} + \epsilon$, it is well known that the estimate for $\beta$ that gives the minimum residual sum of squares (RSS) is given by $\hat{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$.
Of course, since $\hat{\beta}$ is just an estimate, then we want to know how far it deviates from the true values $\beta$.
In the derivation I am reading (How are the standard errors of coefficients calculated in a regression?), the variance of the estimate is given by:
$$ V(\hat{\beta}) = V((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y})= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\sigma^2 \mathbf{I}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}$$
Please help me understand what happened in this derivation.
(Or you can present a simpler derivation)
Thanks