2

I have multiple (> 2) correlated data series and want to sample from them. Each of the series has several thousand historical value, unfortunately I do not know which datapoints were recorded together. The only thing I know is the correlation coefficients between any pair of those series. I want to sample from those historic values, but respect the correlation between those. Is there any way to do this?

Update: I should also add, the data series do not necessarily have the same amount of data.

  • Welcome to CV. Note that your username, identicon, & a link to your user page are automatically added to every post you make, so there is no need to sign your posts. In fact, we prefer you don't. – Silverfish Nov 05 '15 at 22:30
  • Just to clarify "I do not know which datapoints were recorded together": you have sequences of (let's say) $x_i$, $y_i$, and $z_i$ which have been separately permuted, so you do not know which corresponds to which - for instance it is possible that $x_{13}$, $y_{224}$, and $z_{129}$ were all from the same data point originally? – Silverfish Nov 05 '15 at 22:33
  • yes, i only know the correlation coefficients between every pair of series, and the sampled data should have (approximately) the same correlations – Jonathan Blot Nov 06 '15 at 01:09
  • 1
    The description of your data is not clear and it's also not clear how you would even obtain any measure of association with data pairs "not recorded together." Would it be possible to add some representative records (from the "several thousand") to show how you are building up to the *correlations*? – Mike Hunter Nov 06 '15 at 13:09
  • Since your update, "I should also add, the data series do not necessarily have the same amount of data" the situation is less clear than before. Your earlier comment suggested that the data had simply been permuted, but it now appears that some data is now lost: so for instance an original data point $(x_1,y_1,z_1)$ has become (in your records) $x_{13}$, $y_{224}$ and the $z$ has been lost entirely. Could you add any details about how data is being lost, is it entirely at random? – Silverfish Nov 06 '15 at 13:35
  • 1
    That was a misunderstanding of my side, sorry. To clarify: -All data series were recorded at the same time, but the link has been lost (so assume a permutation) -Every series lost a random amount of data (so the series have differnt amounts of data) -I know the correlation coefficients between all pairs of series (prior knowledge) -the only thing I am interested in is generating new samples from the historical data, that match the old correlation coefficients as closely as possible and stem from historical data. -i do not need to reconstruct the which historical data was recorded together – Jonathan Blot Nov 06 '15 at 14:58
  • Are the vectors $\mathbf{x}$, $\mathbf{y}$, and $\mathbf{z}$ permuted, or are samples within each vector permuted? An example of the latter: $[x_1,y_1,z_1], [y_2,z_2,x_2],[z_3,x_3,y_3], ...$? If the vectors are permuted, then: 1.) determine the empirical distributions of each vector 2.) create a 3x3 correlation matrix 3.) Use the Gaussian copula to generate correlated uniform random variables according to your correlation matrix 4.) invert with the inverse of the empirical distributions. See http://stats.stackexchange.com/questions/7515 – Kiran K. Nov 11 '15 at 13:45

0 Answers0