0

I have a hard time solving the following issue, so hopefully someone is willing to help. I believe I am almost there but just missing a single step.

There are three random variables $X$, $Y$, and $Z$ that all follow $N(\mu,\sigma^2)$. Variables $Y$ and $Z$ are correlated with $\rho$. Let's say that a single value is sampled from the normal distributions to get sampled values $x$, $y$, and $z$. Then I compute the following difference scores $x - y$ and $x - z$. I would like to know whether there is an analytical expression to compute the variance of the two difference scores (so a variance based on these two difference scores). $\sigma^2$ and $\rho$ are assumed to be known, but $\mu$ is unknown.

I figured out what $Cov(x - y, x - z)$ is using the answer on this question:

$Cov(x-y, x-z) = Cov(x,x) - Cov(x,z) - Cov(y,x) + Cov(y,z)$

$Cov(x,z)$ and $Cov(y,x)$ can be dropped because these are independent, so

$Cov(x-y, x-z) = Var(x) + Cov(y,z) = \sigma^2 + \rho \times \sigma^2$

I expected the variance of $x-y$ and $x-z$ then to be something along the lines

$Var(x-y) + Var(x-z) - 2 \times Cov(x-y, x-z)$,

but simulations in R (see code below) show that this is structurally twice as large as the actual variance.

Hence, my question is: is it possible to compute this variance in this way, and if yes what am I doing wrong?

Thank you in advance!

R code:

rm(list = ls())

rho <- 0.5
sigma2 <- 1
iters <- 10000

cov <- rho * sqrt(sigma2)*sqrt(sigma2)
Sigma <- matrix(cov, nrow = 2, ncol = 2)
diag(Sigma) <- sigma2

xs <- matrix(NA, nrow = iters, ncol = 2)

for (i in 1:iters)
{
  xs[i, ] <- MASS::mvrnorm(1, mu = rep(0, 2), Sigma = Sigma) 
}

xc <- rnorm(iters, mean = 0, sd = sqrt(sigma2))

cov_comp <- rho*sigma2+sigma2

2*sigma2+2*sigma2-2*cov_comp

mean(apply(xs-xc, 1, var))
User33
  • 307
  • 1
  • 7
  • I don't understand this sentence: "I expected the variance of − and − then to be something along the lines" – gunes Jun 26 '19 at 17:41
  • I expected the variance to be equal to $Var(x-y) + Var(x-z) - 2 \times Cov(x-y, x-z)$, but this is apparently not the case. As you can see in the simulations, the variance is twice as large as $Var(x-y) + Var(x-z) - 2 \times Cov(x-y, x-z)$. – User33 Jun 26 '19 at 18:42

1 Answers1

1

The theoretical calculation is the variance of $(X-Y)-(X-Z)$. In the last line of your code, apply(xs-xc, 1, var) statement produces iters number of variances and then you average them. This doesn't make sense. Because each of the variances are obtained from $2$ samples, that are not drawn from the same population. You should be doing the following:

xx = xc-xs 
var(xx[,1]-xx[,2])
gunes
  • 49,700
  • 3
  • 39
  • 75