The other answer here and the answers on a later version of this question here [Covariance matrix of least squares estimator $\hat{\beta}$ are not correct
In the book you are referencing, the data $x_1,\dots,x_N$ ($x_i^{\top}$ is the ith row of $\mathbf{X}$) are not random. The authors say that the $y_i$ are uncorrelated with constant variance. And we have the formula
$$
\hat{\beta} = (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y}.
$$
That's really all they say. There is no assumption that the real distribution of $Y$ is a linear function of $X$ plus a noise. And there is no explicit assumption that $\mathbb{E}(Y|X) = 0$. So, if you try to work with the information you are actually given in the book, you'll do something like this:
First we compute the expectation:
$$
\mathbb{E}(\hat{\beta}) = (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y}) = (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y})
$$
So
\begin{align}
\mathbb{E}(\hat{\beta})\mathbb{E}(\hat{\beta})^T &= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y}) \Bigl((\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y})\Bigr)^{\top} \\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y}) \mathbb{E}(\mathbf{y})^{\top} \mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1} \\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y}) \mathbb{E}(\mathbf{y})^{\top} \mathbf{X}\bigl((\mathbf{X}^{\top}\mathbf{X})^{-1}\bigr)^{\top} \\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y}) \mathbb{E}(\mathbf{y})^{\top} \mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}
\end{align}
And
\begin{align}
\mathbb{E}(\hat{\beta}\hat{\beta}^T) &= \mathbb{E}\biggl((\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y}\Bigl( (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y}\Bigr)^{\top} \biggr)\\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbb{E}(\mathbf{y} \mathbf{y}^{\top}) \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1}
\end{align}
The variance-covariance matrix is the difference as usual, which comes out as
\begin{align}
&(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\bigl(\mathbb{E}(\mathbf{y} \mathbf{y}^{\top}) - \mathbb{E}(\mathbf{y}) \mathbb{E}(\mathbf{y})^{\top} \bigr) \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1} \\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\bigl(\sigma^2 I_{N\times N} \bigr) \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1} \\
&= (\mathbf{X}^{\top}\mathbf{X})^{-1}\sigma^2
\end{align}
So the only assumption that we had, I use explicitly at the end: We know the variance-covariance matrix of $\mathbf{y}$ is just $\sigma^2$ multiplied by the identity matrix.