I want to generate some signals that have a correlation distribution around a specific pre-defined correlation value (i.e., the distribution of the values of their correlation matrix is around a specific expected value (rho)).
For example, for rho = 0.5
, if I want to build some signals X_synthetic
that their correlation matrix is cor_matrix
, then the values in np.corrcoef(X_synthetic)
should all be around 0.5. So the histogram will be around 0.5 in that case.
Example to achieve this for a positive rho value:
import numpy as np
#desired expected rho (of the distribution of the corr matrix)
rho = 0.5
# desired correlation matrix
cor_matrix = np.ones((5,5))* rho
np.fill_diagonal(cor_matrix,1) # 1s in diagonal
print(cor_matrix)
# this is artificial case but it will result in the derired distribution.
array([[1. , 0.5, 0.5, 0.5, 0.5],
[0.5, 1. , 0.5, 0.5, 0.5],
[0.5, 0.5, 1. , 0.5, 0.5],
[0.5, 0.5, 0.5, 1. , 0.5],
[0.5, 0.5, 0.5, 0.5, 1. ]])
L = np.linalg.cholesky(cor_matrix)
# build some signals that will result in the desired correlation matrix
X_synthetic = L.dot(np.random.normal(0,1, (5,2000)))
# estimate their correlation matrix
np.corrcoef(X_synthetic)
array([[1. , 0.50576661, 0.51472813, 0.47208374, 0.49260528],
[0.50576661, 1. , 0.4798111 , 0.48540114, 0.47225243],
[0.51472813, 0.4798111 , 1. , 0.4649033 , 0.4745259 ],
[0.47208374, 0.48540114, 0.4649033 , 1. , 0.50059795],
[0.49260528, 0.47225243, 0.4745259 , 0.50059795, 1. ]])
#* Very good approximation. All values are fluctuating around 0.5.
#* So the distribution of the correlation values of `X_synthetic` is around the expected value `0.5`.
Now, I want to do the same, but the values of np.corrcoef(X_synthetic)
should all be around -0.3 so that the histogram would be around -0.3 in that case.
#desired expected rho (of the distribution of the corr matrix)
rho = -0.3
# desired correlation matrix
cor_matrix = np.ones((5,5))* rho
np.fill_diagonal(cor_matrix,1) # 1s in diagonal
print(cor_matrix)
array([[ 1. , -0.3, -0.3, -0.3, -0.3],
[-0.3, 1. , -0.3, -0.3, -0.3],
[-0.3, -0.3, 1. , -0.3, -0.3],
[-0.3, -0.3, -0.3, 1. , -0.3],
[-0.3, -0.3, -0.3, -0.3, 1. ]])
L = np.linalg.cholesky(cor_matrix) # fails
X_synthetic = L.dot(np.random.normal(0,1, (5,2000)))
The cholesky will fail and raise LinAlgError: Matrix is not positive definite.
I understand that this is not a realistic case, but in practice, I want to build cor_matrix
in a way such as the X_synthetic
signals would have a correlation matrix with values varying around -0.3
, similarly to the case of 0.5
as shown above.