1

I was wondering how mathematically is it possible to run a full regression analysis between 3 predictors (x1 x2 x3) and a dependent variable (y) by only knowing the: Means, Ns, SDs, and the Correlations between all these 4 variables (without the original data)?

I highly appreciate an R demonstration.

ns    <- c(273, 273, 273, 273)
means <- c(15.4, 7.1, 3.5, 6.2)
sds   <- c(3.4, 0.9, 1.5, 1.4)

r <- matrix( c(
1.0,  .57,  -.4,  .48,  
.57,  1.0, -.61,  .66,   
-.4, -.61,  1.0, -.68,  
.48,  .66, -.68,   1.0), 4)

rownames(r) <- colnames(r) <- c('y', paste0('x', 1:3))
rnorouzian
  • 3,056
  • 2
  • 16
  • 40
  • 1
    You cannot run a truly "full" regression analysis with just these statistics, because you will not be able to construct residuals and perform regression diagnostics that depend on them. You are limited to making and testing parameter estimates. – whuber Dec 05 '18 at 14:35
  • Structural equation modeling is the most obvious answer in my mind, which is more or less the math provided in the answer below. Most SEM programs will be able to accept your inputs (all need to be able to estimate a covariance matrix in the end). – Matt Barstead Dec 10 '18 at 02:29
  • @whuber, do I need to first convert my correlation matrix into a var-covariance matrix using `r_x_iy_i * sd_x_i*sd_y_i` and then go from there? – rnorouzian Dec 10 '18 at 04:52
  • That's one approach, because it reduces your problem to one with a known, explicit solution. – whuber Dec 10 '18 at 15:13

1 Answers1

4
  1. Based on that correlation matrix, you can estimate the standardized regression coefficients (excerpt intercept) by following. (suppose the first column is for y)

$$\left(\begin{matrix} 1.00 & -0.61 & 0.66\\ -.61 & 1.00 & -0.68\\ .66 & -.68 & 1.00\end{matrix}\right)^{-1}\left(\begin{matrix} 0.57\\ -.40\\ .48\end{matrix}\right)$$

  1. Combining standard deviations you can convert standardized regression coefficients into general regression coefficients.

  2. Using the information about means, you can get the estimate of intercept.

user158565
  • 7,032
  • 2
  • 9
  • 19
  • 1
    I do not know how to use R. – user158565 Dec 05 '18 at 05:00
  • 1
    How does collinearity influence this? – PascalVKooten Dec 05 '18 at 09:25
  • Full collinearity ==> not exist of the inverse of that matrix in the answer, and the generalized inverse should be used and there are infinite number of estimates. Partial collinearity ==> the inverse of that matrix is unstable, i.e., the little change in X can result in tremendous change in estimate. – user158565 Dec 05 '18 at 14:20