2

I have two variables from a multivariate standard normal distribution, which are highly correlated (say, r = .8). If I observe variable 1 to be 1.5, how can I predict the value (or the possible range of values) I would expect for variable 2?

I could do so using random draws of the distribution, but I think there should be an analytic way as well, probably making use of projections.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
hanshansen
  • 145
  • 4
  • please note that i don't have a sequence of observations - i just happen to know the correlation and got one observation of variable 1. is there a way to formally derive a regression model from the correlation matrix? – hanshansen Jun 24 '18 at 19:21
  • With only the correlation, the answer surely is not, because correlation tells you nothing about the *level* of one variable relative to another. But with a tiny bit more information you can carry out multiple regression: see https://stats.stackexchange.com/questions/107597. Is that perhaps the question you are trying to ask? – whuber Jun 24 '18 at 19:35
  • Do you know the standard deviations? – Robert Long Jun 24 '18 at 19:53
  • 1
    thanks, that is helpful! i was assuming that all my variables are standard normal, so the correlation = covariance and the sd = 1! – hanshansen Jun 24 '18 at 20:03
  • @whuber it is not actually a duplicate of the question I linked to above, but since we are given the variances, the answer is easy. – Robert Long Jun 25 '18 at 10:01

1 Answers1

2

We know that x and y are bivariate standard normal. So their means are zero and their standard deviations are 1.

The question is how to analytically predict the value of one variable from that of another (given as 1.5), in the absence of any other data.

We are given that the correlation coefficient is 0.8. Since we know both standard deviations are 1, the answer is that the prediction is 1.5 x 0.8 = 1.2

We can easily do a simulation to demonstrate this:

For example:

set.seed(123)
Sigma = matrix(c(1,0.8,0.8,1),2,2)
df <- data.frame(mvrnorm(n = 100, rep(0, 2), Sigma, empirical=TRUE))
m0 <- lm(X2~X1,data=df)
summary(m0)
new <- data.frame(X1=1.5)
predict(m0, new, interval="prediction")

  fit       lwr      upr
1 1.2 0.3892231 2.010777
Antoni Parellada
  • 23,430
  • 15
  • 100
  • 197
Robert Long
  • 53,316
  • 10
  • 84
  • 148