Is there a way to use the covariance matrix to find coefficients for multiple regression?

Question

For simple linear regression, the regression coefficient is calculable directly from the variance-covariance matrix $C$, by $$ C_{d, e}\over C_{e,e} $$ where $d$ is the dependent variable's index, and $e$ is the explanatory variable's index.

If one only has the covariance matrix, is it possible to calculate the coefficients for a model with multiple explanatory variables?

ETA: For two explanatory variables, it appears that $$\beta_1 = \frac{Cov(y,x_1)var(x_2) - Cov(y,x_2)Cov(x_1,x_2)}{var(x_1)var(x_2) - Cov(x_1,x_2)^2} $$ and analogously for $\beta_2$. I'm not immediately seeing how to extend this to three or more variables.

The coefficient vector $\hat{\beta}$ is the solution to $X'Y=(X'X)^{-1}\beta$. Some algebraic manipulation reveals that this is in fact the same as the formula you give in the 2-coefficient case. Laid out nicely here: http://www.stat.purdue.edu/~jennings/stat514/stat512notes/topic3.pdf. Not sure if that helps at all. But I'd venture to guess that this is impossible in general based on that formula. — shadowtalker, Jul 22 '14 at 13:13
@David Did you figure out how to extend this to an arbitrary number of explanatory variables (beyond 2)? I need the expression. — Jane Wayne, May 16 '16 at 07:47
@JaneWayne I'm not sure I understand your question: whuber gave the solution below in matrix form, $C^{-1}(\text{Cov}(X_i, y))^\prime$ — David, Jun 15 '16 at 14:19

score 53 · Accepted Answer · edited Jun 11 '20 at 14:32

Yes, the covariance matrix of all the variables--explanatory and response--contains the information needed to find all the coefficients, provided an intercept (constant) term is included in the model. (Although the covariances provide no information about the constant term, it can be found from the means of the data.)

Analysis

Let the data for the explanatory variables be arranged as $n$-dimensional column vectors $x_1, x_2, \ldots, x_p$ with covariance matrix $C_X$ and the response variable be the column vector $y$, considered to be a realization of a random variable $Y$. The ordinary least squares estimates $\hat\beta$ of the coefficients in the model

$$\mathbb{E}(Y) = \alpha + X\beta$$

are obtained by assembling the $p+1$ column vectors $X_0 = (1, 1, \ldots, 1)^\prime, X_1, \ldots, X_p$ into an $n \times p+1$ array $X$ and solving the system of linear equations

$$X^\prime X \hat\beta = X^\prime y.$$

It is equivalent to the system

$$\frac{1}{n}X^\prime X \hat\beta = \frac{1}{n}X^\prime y.$$

Gaussian elimination will solve this system. It proceeds by adjoining the $p+1\times p+1$ matrix $\frac{1}{n}X^\prime X$ and the $p+1$-vector $\frac{1}{n}X^\prime y$ into a $p+1 \times p+2$ array $A$ and row-reducing it.

The first step will inspect $\frac{1}{n}(X^\prime X)_{11} = \frac{1}{n}X_0^\prime X_0 = 1$. Finding this to be nonzero, it proceeds to subtract appropriate multiples of the first row of $A$ from the remaining rows in order to zero out the remaining entries in its first column. These multiples will be $\frac{1}{n}X_0^\prime X_i = \overline X_i$ and the number subtracted from the entry $A_{i+1,j+1} = X_i^\prime X_j$ will equal $\overline X_i \overline X_j$. This is just the formula for the covariance of $X_i$ and $X_j$. Moreover, the number left in the $i+1, p+2$ position equals $\frac{1}{n}X_i^\prime y - \overline{X_i}\overline{y}$, the covariance of $X_i$ with $y$.

Thus, after the first step of Gaussian elimination the system is reduced to solving

$$C_X\hat{\beta} = (\text{Cov}(X_i, y))^\prime$$

and obviously--since all the coefficients are covariances--that solution can be found from the covariance matrix of all the variables.

(When $C_X$ is invertible the solution can be written $C_X^{-1}(\text{Cov}(X_i, y))^\prime$. The formulas given in the question are special cases of this when $p=1$ and $p=2$. Writing out such formulas explicitly will become more and more complex as $p$ grows. Moreover, they are inferior for numerical computation, which is best carried out by solving the system of equations rather than by inverting the matrix $C_X$.)

The constant term will be the difference between the mean of $y$ and the mean values predicted from the estimates, $X\hat{\beta}$.

Example

To illustrate, the following R code creates some data, computes their covariances, and obtains the least squares coefficient estimates solely from that information. It compares them to the estimates obtained from the least-squares estimator lm.

#
# 1. Generate some data.
#
n <- 10        # Data set size
p <- 2         # Number of regressors
set.seed(17)
z <- matrix(rnorm(n*(p+1)), nrow=n, dimnames=list(NULL, paste0("x", 1:(p+1))))
y <- z[, p+1]
x <- z[, -(p+1), drop=FALSE]; 
#
# 2. Find the OLS coefficients from the covariances only.
#
a <- cov(x)
b <- cov(x,y)
beta.hat <- solve(a, b)[, 1]  # Coefficients from the covariance matrix
#
# 2a. Find the intercept from the means and coefficients.
#
y.bar <- mean(y)
x.bar <- colMeans(x)
intercept <- y.bar - x.bar %*% beta.hat

The output shows agreement between the two methods:

(rbind(`From covariances` = c(`(Intercept)`=intercept, beta.hat),
       `From data via OLS` = coef(lm(y ~ x))))

                  (Intercept)        x1        x2
From covariances     0.946155 -0.424551 -1.006675
From data via OLS    0.946155 -0.424551 -1.006675

Thanks, @whuber! This is exactly what I was looking for, and my atrophied brain was unable to get to. As an aside, the motivation for the question is that for various reasons we essentially do not have the full $X$ available, but have `cov(z)` from previous calculations. — David, Jul 22 '14 at 15:36
@whuber In your example, you computed the intercept from `y` and `x` and `beta.hat`. The `y` and `x` are part of the original data. Is it possible to derive the intercept from the covariance matrix and means alone? Could you please provide the notation? — Jane Wayne, May 16 '16 at 07:34
@Jane Given only the means $\bar X$, apply $\hat \beta$ to them: $$\overline X \hat\beta = \overline{X \hat\beta}.$$ I have changed the code to reflect this. — whuber, May 16 '16 at 13:50
I think some confusion is caused (it was certainly the case for me) by the use of the full covariance matrix $C$ which (as far as I can tell) the OP is using to indicate the co-variance matrix $Cov([\textbf{X,y}],[\textbf{X,y}])$, while here $C$ seems to represent covarinace matrix for Xs only, i.e. $Cov(\textbf{X},\textbf{X})$. — Confounded, Feb 28 '20 at 11:12
@Confounded Could you suggest any way to improve the first sentence of my answer, then, which was intended to resolve that potential for confusion? — whuber, Feb 28 '20 at 14:10
The first sentence is fine as it is; perhaps you might want to change the notation in the main part from $C$ to $C_{e,e}$ as used in the OP to indicate that it is a covariance matrix of the "explanatory" variables. — Confounded, Feb 29 '20 at 15:56
@Confounded That's an excellent idea, thank you. I see I didn't even define my "$C.$" I have elected to call it $C_X,$ though, because that seems a better mnemonic for connecting it to the variables $X.$ — whuber, Feb 29 '20 at 17:46
I wonder if the paragraph about subtracting the mean terms $\bar{X}\bar{X}$ could be written in such a way that it explains *mathematically* rather than *computationally* how the first expression is equivalent to the covariance expression. Gaussian elimination isn't a good argument because that's just one method of solving linear equations. — Migwell, Nov 14 '21 at 00:05
@Migwell On the contrary, Gaussian elimination is a perfectly legitimate mathematical argument, albeit a little unusual in this context. Plenty of proofs--especially in discrete mathematics, combinatorics, optimization, and computational geometry (to name a few areas of mathematics important in statistical applications)--amount to applying "just one method" of solving numerical problems. There is no inherent distinction between "mathematical" and "computational" forms of reasoning. — whuber, Nov 14 '21 at 15:51

Is there a way to use the covariance matrix to find coefficients for multiple regression?

1 Answers1

Analysis

Example

Linked

Related