Given a multivariate probability density, for example a 2-dimensional random vector with probability density function $\rho\left(x,y\right)$, the conditional probability density function of $x$ for given $y=y^{*}$ is derived by evaluating $y=y^{*}$ in the density probability function. Fixing one random variable to a given value affects the overall probability density distribution for the remaining still-random random variables. Constant re-normalization is also needed since fixing a random value for a given observation is not equivalent than marginalizing the distribution, which would be accounting for every possible $y$.
Regarding Gaussian processes it follows the same procedure. A Gaussian process is a multivariate Gaussian probability distribution representing a prior when a Kernel is provided but not particular restrictions to observations is considered. The case of "predicting" comes by conditioning on previous observations to be restricted to a fixed value or rather by noisy values.
I will provide a approximate derivation to understand the result in your citation. Considering the change of notation $A=K\left(A,A\right)$,$C=K\left(\hat{A},A\right)$,$S=K\left(\hat{A},\hat{A}\right)$ your Gaussian Process is equivalent to
$$
\mathcal{N}\left(\begin{bmatrix}
\mu_{x}\\
\mu_{y}
\end{bmatrix}, \begin{bmatrix}
A & C\\
C^{T} & S
\end{bmatrix}\right) =
\mathcal{N}\left(\begin{bmatrix}
\mu_{x}\\
\mu_{y}
\end{bmatrix}, \begin{bmatrix}
\tilde{A} & \tilde{C}\\
\tilde{C}^{T} & \tilde{S}
\end{bmatrix}^{-1}\right)
$$
Where such representation of the covariance matrix is possible since for a gaussian process to exist the same operation has to be achievable. I will assume that you are familiar with the exponential functional dependence of this particular distribution and let's do algebra with its exponent (and omitting the $-1/2$ factor) where the main point of that result is found, leaving the normalization constant aside for the moment being.
$$
\left(\mu_{x}^{T}-x^{T},\mu_{y}^{T}-y^{T}\right)\begin{pmatrix}
\tilde{A} & \tilde{C}\\
\tilde{C}^{T} & \tilde{S}
\end{pmatrix}\left(\mu_{x}-x,\mu_{y}-y\right)
$$
At this point, $y$ is not a random variable anymore. Let's operate to see how the exponent of the distribution modifies,
$$
\left(\mu_{x}^{T}-x^{T}\right) \tilde{A} \left(\mu_{x}-x\right) +
\left(\mu_{x}^{T}-x^{T}\right) \tilde{C} \left(\mu_{y}-y\right) +
\left(\mu_{y}^{T}-y^{T}\right) \tilde{C}^{T} \left(\mu_{x}-x\right) +
\left(\mu_{y}^{T}-y^{T}\right) \tilde{S} \left(\mu_{y}-y\right)
$$
From such series the exponent only involving y-magnitudes would factor out as exponential and would remain as constant (keeping in mind here that y is not a random variable anymore). From the remaining terms it is possible to re-operate trying to fit the terms for a new gaussian-exponent like fashion. You already know that the conditional distribution is a Gaussian distribution with different covariance matrix $\Sigma'$
$$
\left(\mu_{x}^{T}-x^{T}\right) \tilde{A} \left(\mu_{x}-x\right) +
\left(\mu_{x}^{T}-x^{T}\right) \tilde{C} \left(\mu_{y}-y\right) +
\left(\mu_{y}^{T}-y^{T}\right) \tilde{C}^{T} \left(\mu_{x}-x\right) = \\
\left(\mu_{x}^{T}-x^{T}\right) \tilde{A} \left(\mu_{x}-x\right)
-\left(\mu_{x}^{T}-x^{T}\right) \tilde{A}\tilde{A}^{-1}\tilde{C} \left(y-\mu_{y}\right) -\left(y^{T}-\mu_{y}^{T}\right) \tilde{C}^{T}\tilde{A}^{-1}\tilde{A} \left(\mu_{x}-x\right)\\
$$
The next step is just re-grouping common factors finally yielding the exponent of a new multivariate gaussian distribution,
$$
\left(\mu_{x}^{T}-x^{T}\right) \tilde{A} \left(\mu_{x}-x\right) +
\left(\mu_{x}^{T}-x^{T}\right) \tilde{C} \left(\mu_{y}-y\right) +
\left(\mu_{y}^{T}-y^{T}\right) \tilde{C}^{T} \left(\mu_{x}-x\right) = \\
\left(\mu_{x}^{T} -\left(y^{T}-\mu_{y}^{T}\right) \tilde{C}^{T}\tilde{A}^{-1}-x^{T}\right) \tilde{A} \left(\mu_{x}-\tilde{A}^{-1}\tilde{C} \left(y-\mu_{y}\right)-x\right) \\
\approx \log\left( \mathcal{N}\left(\mu_{x}-\tilde{A}^{-1}\tilde{C} \left(y-\mu_{y} \right),\tilde{A}^{-1}\right) \right)
$$
Again I want to stress here that the normalization factor has to be recalculated. In fact by definition of conditional probability $P\left(A|B\right) = \frac{P\left(A,B\right)}{P\left(B\right)}$, where $P\left(B\right)$ is a fixed value.
If you truly are interested in Gaussian Processes that book is highly recommended
The last step would be connecting matrices $A$ and $C$ with $\tilde{A}$ and $\tilde{C}$. For that I will redirect you to equations A.11 - A.13 from the book "Gaussian Processes for Machine Learning" (http://www.gaussianprocess.org/gpml/) which cites the inverse of a partitioned matrix as I expressed at the beginning of my answer. In fact another reference for the result I summarized is given in Appendix A.2.