Elsewhere on this site, explicit solutions to the ordinary least squares regression
$$\mathbb{E}(z_i) = A x_i + B y_i + C$$
are available in matrix form as
$$(C,A,B)^\prime = (X^\prime X)^{-1} X^\prime z\tag{1}$$
where $X$ is the "model matrix"
$$X = \pmatrix{1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n}$$
and $z$ is the response vector
$$z = (z_1, z_2, \ldots, z_n)^\prime.$$
That's a perfectly fine, explicit, computable answer. But maybe there is some additional understanding that can be wrung out of it by inspecting the coefficients. This can be achieved by choosing appropriate units in which to express the variables.
The best units for this purpose center each variable at its mean and use its standard deviation as the unit of measurement. Explicitly, let the three means be $m_x, m_y,$ and $m_z$ and the three standard deviations be $s_x, s_y,$ and $s_z$. (It turns out not to matter whether you divide by $n$ or $n-1$ in computing the standard deviations. Just make sure you use a consistent convention when you compute any second moment of the data.) The values of the variables in these new units of measurement are
$$\xi_i = \frac{x_i - m_x}{s_x},\ \eta_i = \frac{y_i - m_y}{s_y},\ \zeta_i = \frac{z_i - m_z}{s_z}.$$
This process is known as standardizing the data. The variables $\xi$, $\eta$, and $\zeta$ are the standardized versions of the original variables $x$, $y$, and $z$.
These relationships are invertible:
$$x_i = s_x \xi_i + m_x,\ y_i = s_y \eta_i + m_y,\ z_i = s_z \zeta_i + m_z.$$
Plugging these into the defining relationship
$$\mathbb{E}(z_i) = C + Ax_i + By_i$$
and simplifying yields
$$\mathbb{E}(s_z \zeta_i + m_z) = C + A(s_x \xi_i + m_x) + B(s_y \eta_i + m_y).$$
Solving for the expectation of the dependent variable $\zeta_i$ yields
$$\mathbb{E}(\zeta_i) = \left(\frac{C + Am_x + Bm_y - m_z}{s_z}\right) + \left(\frac{A s_x}{s_z}\right) \xi_i + \left(\frac{B s_y}{s_z}\right) \eta_i.$$
If we write these coefficients as $\beta_0, \beta_1, \beta_2$ respectively, then we can recover $A, B, C$ by comparing and solving. For the record this gives
$$A = \frac{s_z \beta_1}{s_x},\ B = \frac{s_z \beta_2}{s_y},\text{ and }C = s_z \beta_0 + m_z - A m_x - B m_y.\tag{2}$$
The point of this becomes apparent when we consider the new model matrix
$$\Xi = \pmatrix{1 & \xi_1 & \eta_i \\ 1 & \xi_2 & \eta_2 \\ \vdots & \vdots & \vdots \\ 1 & \xi_n & \eta_n}$$
and the new response matrix $\zeta = (\zeta_1, \zeta_2, \ldots, \zeta_n)$, because now
$$\Xi^\prime \Xi = \pmatrix{n & 0 & 0 \\ 0 & n & n\rho \\ 0 & n\rho & n}$$
and
$$\Xi^\prime \zeta = (0, n\tau, n\upsilon)^\prime$$
where $\rho$ is the correlation coefficient $\frac{1}{n}\sum_{i=1}^n \xi_i \eta_i$, $\tau$ is the correlation coefficient $\frac{1}{n}\sum_{i=1}^n \xi_i \zeta_i$, and $\upsilon$ is the correlation coefficient $\frac{1}{n}\sum_{i=1}^n \eta_i \zeta_i$.
To solve the normal equations $(1)$ we may divide both sides by $n$, giving
$$\pmatrix{1 & 0 & 0 \\ 0 & 1 & \rho \\ 0 & \rho & 1}\pmatrix{\beta_0 \\ \beta_1 \\ \beta_2} = \pmatrix{0 \\ \tau \\ \upsilon} .$$
What originally looked like a formidable matrix formula has been reduced to a truly elementary set of three simultaneous equations. Provided $|\rho| \lt 1$, its solution is easily found to be
$$\pmatrix{\hat\beta_0 \\ \hat\beta_1 \\ \hat\beta_2} = \frac{1}{1-\rho^2}\pmatrix{0 \\ \tau-\rho\upsilon \\ \upsilon-\rho\tau}.$$
Plugging these into the coefficients in $(2)$ produces the estimates $\hat A, \hat B,$ and $\hat C$.
In fact, even more has been achieved:
It is now evident why the cases $|\rho|=1$ are problematic: they introduce a divide-by-zero condition in the solution.
It is equally evident how to determine whether a solution exists when $|\rho=1|$ and how to obtain it. It will exist when the second and third normal equations in $\Xi$ are redundant and it will be obtained simply by ignoring one of the variables $x$ and $y$ in the first place.
We can derive some insight into the solution generally. For instance, from $\hat\beta_0=0$ in all cases, we may conclude that the fitted plane must pass through the point of averages $(m_x, m_y, m_z)$.
It is now evident that the solution can be found in terms of the first two moments of the trivariate dataset $(x, y, z)$. This sheds further light on the fact that coefficient estimates can be found from means and covariance matrices alone.
Furthermore, equation $(2)$ shows that the means are needed only to estimate the intercept term $C$. Estimates of the two slopes $A$ and $B$ require only the second moments.
When the regressors are uncorrelated, $\rho=0$ and the solution is that the intercept is zero and the slopes are the correlation coefficients between the response $z$ and the regressors $x$ and $y$ when we standardize the data. This is both easy to remember and provides insight into how regression coefficients are related to correlation coefficients.
Putting this all together, we find that (except in the degenerate cases $|\rho|=1$) the estimates can be written
$$\eqalign{
\hat A &= \frac{\tau - \rho\upsilon}{1-\rho^2} \frac{s_z}{s_x} \\
\hat B &= \frac{\upsilon - \rho\tau}{1-\rho^2} \frac{s_z}{s_y} \\
\hat C &= m_z -m_x \hat A - m_y \hat B.
}$$
In these formulae, the $m_{*}$ are the sample means, the $s_{*}$ are the sample standard deviations, and the greek letters $\rho, \tau,$ and $\upsilon$ represent the three correlation coefficients (between $x$ and $y$, $x$ and $z$, and $y$ and $z$, respectively).
Please note that these formulas are not the best way to carry out the calculations. They all involve subtracting quantities that might be of comparable size, such as $\tau-\rho\upsilon$, $\upsilon-\rho\tau$, and $m_z - (-m_x \hat A - m_y \hat B)$. Such subtraction involves loss of precision. The matrix formulation allows numerical analysts to obtain more stable solutions that preserve as much precision as possible. This is why people rarely have any interest in term-by-term formulas. The other reason there is little interest is that as the number of regressors increases, the complexity of the formulas grows exponentially, quickly becoming too unwieldy.
As further evidence of the correctness of these formulas, we may compare their answers to those of a standard least-squares solver, the lm
function in R
.
#
# Generate trivariate data.
#
library(MASS)
set.seed(17)
n <- 20
mu <- 1:3
Sigma <- matrix(1, 3, 3)
Sigma[lower.tri(Sigma)] <- Sigma[upper.tri(Sigma)] <- c(.8, .5, .6)
xyz <- data.frame(mvrnorm(n, mu, Sigma))
names(xyz) <- c("x", "y", "z")
#
# Obtain the least squares coefficients.
#
beta.hat <- coef(lm(z ~ x + y, xyz))
#
# Compute the first two moments via `colMeans` and `cov`.
#
m <- colMeans(xyz)
sigma <- cov(xyz)
s <- sqrt(diag(sigma))
rho <- t(t(sigma/s)/s); rho <- as.vector(rho[lower.tri(rho)])
#
# Here are the least squares coefficient estimates in terms of the moments.
#
A.hat <- (rho[2] - rho[1]*rho[3]) / (1 - rho[1]^2) * s[3] / s[1]
B.hat <- (rho[3] - rho[1]*rho[2]) / (1 - rho[1]^2) * s[3] / s[2]
C.hat <- m[3] - m[1]*A.hat - m[2]*B.hat
#
# Compare the two solutions.
#
rbind(beta.hat, formulae=c(C.hat, A.hat, B.hat))
The output exhibits two identical rows of estimates, as expected:
(Intercept) x y
beta.hat 1.522571 0.3013662 0.403636
formulae 1.522571 0.3013662 0.403636