1

Is it possible to have two different methods of sampling from a bivariate normal distribution with a non-identity correlation matrix in such a way that one method would "consistently" result in a sample whose sample correlation matrix is closer to the correlation matrix of the distribution? I do not have a precise definition of what "consistently" means.

I heard from a professional statistician that a method which gives a sample whose correlation is always very close to that of the underlying distribution may not be desirable, because there should be some randomness in the sample correlation matrices computed from different samples using the same method.

I found this statement surprising as I would have expected that it is better to have a sample correlation as close as possible to the correlation of the original normal distribution. But I am not knowledgeable enough in this area to argue against that statement.

EDIT: The "sampling" in my discussion involved using a Gaussian copula and a correlation matrix to generate the sample. In this case, would I be doing something "bad" if I computed many samples in a loop and returned the one with the closest match to the desired correlation as the sample.

Kavka
  • 453
  • 3
  • 10
  • 2
    I suppose it depends on what you mean by *to sample*. – cardinal Feb 14 '12 at 03:03
  • 2
    I'm not sure if this is what you're thinking of, but it is always possible to draw a sample of size N-df, and then construct the last few points so as to exactly match the sample statistics to the population parameters. A simple example is 10 data w/ mean=0: draw 9 points at random, & the 10th point is 0 minus the sum of the 1st 9. Dealing w/ a bivariate dist is just a more complicated version of the same idea. If this is what you mean, I would agree with your friend--it's better for samples to have more randomness to mimic the real world better. – gung - Reinstate Monica Feb 14 '12 at 05:55
  • 2
    If sampling means producing at random pairs $(X_i,Y_i)$ in an iid manner from the normal distribution, the distribution of the sample correlation matrix is set and therefore the answer to your question is no. If sampling means something else, anything is possible! – Xi'an Feb 14 '12 at 11:46
  • 1
    Check this [question](http://stats.stackexchange.com/q/15011/6082) for two methods: one returns a correlation between two variables that is sampled from the true bivariate normal distribution (i.e., sample correlations will vary around the true value), the other returns a bivariate sample with an exact predefined correlation. – Felix S Feb 15 '12 at 08:10

0 Answers0