In matrix notation we have data $\left (\mathbf y, \mathbf X\right)$ and we consider the model
$$\mathbf y = \mathbf X\beta + \mathbf u$$
where for the moment we only assume that the regressor matrix contains a series of ones, so that we can safely assume that the "error term" $\mathbf u$ has zero mean. We do not as yet make any statistical/probabilistic assumptions.
Calculating the unknown betas by Ordinary Least Squares is a mathematical approximation method that needs no statistical assumptions. We obtain
$$\hat \beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf y$$
This is the (orthogonal) Linear Projection coefficient vector, and, as a mathematical approximation story, it stops here.
Now we want to talk about the "standard error" of the estimates. But that is a statistical concept, and so we must assume something random and probabilistic. Assume that the regressors are all deterministic, but $\mathbf u$ is a random variable. Due to the regressor matrix containing a series of ones, we then have $E(\mathbf u) = \mathbf 0$, where $E$ denotes the expected value.
We have
$$\hat \beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf y = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\left(\mathbf X\beta + \mathbf u\right) = \beta +\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u$$
$$\implies \hat \beta -\beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u$$
Since $\beta$ is a constant, we have that $\text{Var}(\hat \beta -\beta) = \text{Var}(\hat \beta)$, where the Variance is the square of the standard deviation. The multivariate version of the variance, is
$$\text{Var}(\hat \beta -\beta) = E\Big[(\hat \beta -\beta)(\hat \beta -\beta)'\Big] - E\Big[(\hat \beta -\beta)\Big]E\Big[(\hat \beta -\beta)\Big]'$$
where the prime denotes the transpose. Substituting,
$$\text{Var}(\hat \beta) = E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\mathbf u'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} \Big] - E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\Big]E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\Big]'$$
Since we have assumed that $\mathbf X$ is deterministic, the expected value applies only to $\mathbf u$ so we have
$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u\mathbf u')\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} - \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u)E(\mathbf u)'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} $$
Since $E(\mathbf u) =\mathbf 0$ we are left with
$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u\mathbf u')\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} $$
Now comes another benchmark statistical assumption: the $\mathbf u$ is "homoskedastic" which means
$$\text{Var}(\mathbf u) = E(\mathbf u\mathbf u') = \sigma^2I$$
where $\sigma^2 >0$ is the common variance of each element of the error vector, and $I$ is the identity matrix. Substituting we get
$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\sigma^2I\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} =\sigma^2\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1}$$
$$\text{Var}(\hat \beta) = \sigma^2\left(\mathbf X' \mathbf X\right) ^{-1}$$
The $\sigma^2$ is estimated by $s^2$ as in the OP question, and the diagonal elements of $\left(\mathbf X' \mathbf X\right) ^{-1}$, each multiplied by $s^2$, is the variance of the corresponding element of the estimated beta vector. Taking the square root leads to the standard error of each element. The off-diagonal elements of the matrix, multiplied by $\sigma^2$, give the estimated covariances between the elements of the beta vector.