1

Suppose I have the following regression model:

$Y = b_1 \times T + b_2 \times Z + b_3 \times T\times Z + \epsilon$

where T is a randomly assigned treatment condition and Z is some covariate. I want to test the hypothesis that $b_1=0$, using

$s^2_{b_1}=\sigma^2 (X'X)^{-1}$

However, the design matrix X has missing data. The missingness is a function of Z. But, I do have a corrected variance/covariance matrix $\Sigma$ (using the Pearson-Lawley correction formula). I know that there's a relationship between $X'X$ and $\Sigma$. If I remember right, it's

$\Sigma = \Big[X-\frac{1}{n}ee'X\Big]'\Big[X-\frac{1}{n}ee'X\Big]\frac{1}{n}$

where e is a vector of ones. But, alas, I have a corrected version of $\Sigma$, not $X$. Any way to go from $\Sigma$ to $X$ (or $X'X$)?

Steffen Moritz
  • 1,564
  • 2
  • 15
  • 22
dfife
  • 477
  • 1
  • 3
  • 11
  • I think I can guess most of it, but could you please describe X, e, $\Sigma$, and $\sigma$? – eric_kernfeld Oct 04 '16 at 01:11
  • X is the design matrix, e is a vector of ones, Sigma is the variance/covariance matrix, and sigma is the residual variance. – dfife Oct 11 '16 at 16:19
  • The thread at https://stats.stackexchange.com/questions/107597 appears to provide a full answer. – whuber May 14 '19 at 18:16

1 Answers1

2

It is impossible to estimate regression coefficients from the covariance matrix of the covariates alone: you also need something resembling $X^TY$ or $Cov(X, Y)$. To see why, suppose you have three i.i.d. standard normal vectors $x_1, x_2, x_3$ and you want to run regressions $x_1 \approx \beta_{1\sim2} x_2 + \beta_{1\sim3} x_3$ and $x_2 \approx \beta_{2\sim2} x_2 + \beta_{2\sim3} x_3$. The optimal coefficients are quite different -- all zero in the first case versus $(1, 0)$ in the second case -- but the $\Sigma$'s would be identical.

You may be intrigued by this related question:

Using Covariance Estimator to Perform Linear Regression?

eric_kernfeld
  • 4,828
  • 1
  • 16
  • 41