I'm seeking a little help understanding the derivation of the standard error of the coefficients in linear regression. Most of the derivation has already been documented in the answers by ocram to How are the standard errors of coefficients calculated in a regression?, but there is one step I can't follow.
As background, we have the model: $$ \left| \begin{array}{l} \mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon} \\ \mathbf{\epsilon} \sim N(0, \sigma^2 \mathbf{I}), \end{array} \right.$$
And we calculate (a):
$$ \widehat{\textrm{Var}}(\hat{\mathbf{\beta}}) = \hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}, $$
Using (b):
$$\hat{\mathbf{\beta}} = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \mathbf{y}.$$
And (c):
$$ {\textrm{Var}}({\mathbf{Ay}}) = \mathbf{A}{\textrm{Var}}({\mathbf{y}}) \mathbf{A}^{\prime}, $$
Where (d):
$${\mathbf{A}} = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime}$$
From (c) & (d), I can tell that ${\textrm{Var}}({\mathbf{y}})$ must be an [$n$ by $n$] matrix.
But ${\mathbf{why}}$ is the Variance of a vector with dimension $n$ defined as an [$n$ by $n$] matrix?
I'm getting stuck on the following issues:
(1) What is the definition of ${\textrm{Var}}({\mathbf{y}})$?
(2) Why does it need to be defined that way (and why is it $n$ by $n$)?
(3) How does one calculate it?
(4) What is the general result of that calculation in this case?
I hope that these are easy questions to answer for someone versed in multi-variable statistical theory. Thanks in advance for your help!