2

I think this has a simple answer but I can't quite figure it out. I'm trying to simulate a causal relationship (or lack thereof!) and corresponding confounders from a directed acyclic graph (DAG), so I can't simulate everything at the same time from a correlation matrix because if for instance C entirely confounds the relationship between A and B, then A and B are marginally correlated, but not correlated after adjusting for C.

So if I generate C first as a random vector, how do I generate for instance B where the correlation between C and B is exactly R? I know for instance that if I add two uncorrelated random normal(0,1) vectors together (C and let's just say U) I will get B with normal(0,1.44) where R=sqrt(0.5), but what if I want R=0.3, or 0.8? Is there a simple way to specify the variance of U such that R=0.3?

JoeL
  • 21
  • 1

1 Answers1

1

In your ABC example, you could do the following:

  1. generate C
  2. generate A correlated with C
  3. generate B correlated with C

you could use my answer or any other answer to the same question to do #2,3

Python code:

import math
from scipy.linalg import toeplitz, cholesky
from statsmodels.stats.moment_helpers import cov2corr

np.random.seed(0)

# create C variable
T = 10000;
C = np.random.randn(T)

# create A
rhoA = 0.5
R = np.ones((2,2))
R[0,1] = rhoA
R[1,0] = rhoA

X = np.random.randn(T,2)
X[:,0] = C
L = cholesky(R)
Y = np.matmul(X,L)
A = Y[:,1]

# create B
rhoB = -0.5
R[0,1] = rhoB
R[1,0] = rhoB

X = np.random.randn(T,2)
X[:,0] = C
L = cholesky(R)
Y = np.matmul(X,L)
B = Y[:,1]

# check corr
X = np.ones((T,3))
X[:,0] = A
X[:,1] = B
X[:,2] = C

np.corrcoef(np.transpose(X))

Output:

array([[ 1.        , -0.25129687,  0.49479904],
       [-0.25129687,  1.        , -0.50069887],
       [ 0.49479904, -0.50069887,  1.        ]])
Aksakal
  • 55,939
  • 5
  • 90
  • 176