I read that you are confused about how to arrive at $(3)$ from $(2)$. It is convenient to think in terms of vectors. That is, the Kriging weights $w_i$ can be represented as vector $\mathbf{w}$ and the Kriging estimate can be represented as
$$
\newcommand{\Cov}{\rm Cov}
\newcommand{\E}{\rm E}
\hat{Z}(x_0)=\mathbf{w}^T\mathbf{Z}
$$
where $\mathbf{Z}$ is a vector of observed values and the Kriging weights are defined as $\mathbf{w}^T=\mathbf{\Sigma_0}^T\mathbf{\Sigma}^{-1}$ where $\mathbf{\Sigma}=\Cov[\mathbf{Z},\mathbf{Z}]$ and
$\mathbf{\Sigma_0}=\Cov[\mathbf{Z},Z(x_0)]$. Then, assuming we have a covariance model $$\Cov[Z(x_1),Z(x_2)]=C(\|x_1-x_2\|)=C(r)$$
We can derive the Kriging uncertainty as follows:
\begin{align*}
\E[(Z(x_0)-\hat{Z}(x_0))^2]&=\E[(Z(x_0)-\mathbf{w}^T\mathbf{Z})^2]\\
&=\Cov[Z(x_0)-\mathbf{w}^T\mathbf{Z},Z(x_0)-\mathbf{w}^T\mathbf{Z}]\\
&=\Cov[Z(x_0),Z(x_0)]-2\Cov[\mathbf{w}^T\mathbf{Z},Z(x_0)]+\Cov[\mathbf{w}^T\mathbf{Z},\mathbf{w}^T\mathbf{Z}]\\
&=C(0)-2\mathbf{w}^T\Cov[\mathbf{Z},Z(x_0)]+\mathbf{w}^T\Cov[\mathbf{Z},\mathbf{Z}]\mathbf{w}\\
&=C(0)-2\mathbf{w}^T\mathbf{\Sigma_0}+\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}
\end{align*}
The trickery is in pulling the Kriging weights out of the covariances (look up properties of covariances if those steps are not clear). What you are seeing on Wikipedia is the sum notation for the matrix operations presented above.