Bootstrapping CI for correlation coefficients; effect of averaging

Question

I want to bootstrap the confidence intervals for the correlation matrix of a 2D dataset with 2000 separate observations (e.g. data = 2000 x 7, as an example this could be "performance score" in 2000 people in 7 different skill tests).

My main question is which pairwise correlations are significant (e.g. do people with great scores in test 1 also perform well in test 2,3,4,5 etc). Note that my outcome variable is continuous but the grouping variable is not (e.g. skill tests).

This dataset was derived from averaging a 3D dataset with different amounts of technical replicates (e.g. data_raw = 2000 x 7 x 45).

I (statistics noob) can think of two ways of doing this, which give me very different results:

1) Bootstrap the CIs by drawing from trials on the unaveraged data (i.e. repeatedly (i.e. 1000) do the following: Draw 45 (nTrials) times with replacement from 1:45, calculate the mean across the drawn trials, get correlation).

2) Bootstrap the CIs by drawing from observations in the averaged data. (i.e. repeatedly (i.e. 1000) do the following: Draw 2000 times with replacement from 1:2000, get correlation).

Is one of them obviously stupid?

Thank you!

Is your variable of interest (presumably "mood score") continuous or limited to a few ordered values (e.g., 1 through 5)? Also, please say more about the goal of your study overall. If your comparisons are among 7 different noise levels and you are interested in how mood scores depend on noise levels, there might be a more efficient approach than using correlation matrices. Finally, were all 2000 people subjected to all 7 conditions? — EdM, Jul 22 '19 at 21:30
Why you did not reshape the raw dataset into a 90000 x 7 dataset, and then got the correlation from it? — user158565, Jul 22 '19 at 21:30
@user158565, sounds good as well. So now we have 3 ways of doing it. I'm struggling to get a clear intuition as to what the different methods are telling me, given that the results differ a lot. — MCK, Jul 22 '19 at 22:11

score 1 · Accepted Answer · answered Jul 23 '19 at 15:58

When you get confused about bootstrapping, go back to its basic principle:

Taking a bootstrap sample from your data set is like taking a new data set from the population.

This has implications in three ways for your study. (I assume that determining pairwise correlation coefficients between pairs of tests is an appropriate way to estimate the relationships you care about.)

How to bootstrap

Your proposal 1 is to bootstrap-sample among trials, keeping all 2000 individuals in each bootstrap sample. I suppose that would tell you something about the contribution of the variance among trials to the results, but is that what you really care about? A similar argument applies to the proposal in a comment to sample from all 630,000 data points; each bootstrap sample would contain with very high probability at least some trials for each individual. That proposal would, however, run the risk of not having results for all 7 tests for each individual in each bootstrap sample. Neither seems to be what you want.

The variability you presumably care about is the variability among members of the population from which you took the original data set. So you should bootstrap-sample among the 2000 individuals. If your procedure was to average trials within each test for an individual, proceed the same way with your bootstrap samples. That is your proposal 2. Go with that.

Parameter estimates and confidence limits from bootstrapping

If each bootstrap sample relates to your data set as your data set relates to the population, then an issue arises if data-based estimates of your parameter of interest are biased with respect to the population value. With small samples from a bivariate normal distribution, estimates of the correlation coefficient are biased. Bootstrapping is a useful way to estimate that bias, providing a correction you can apply to the estimate from the data set that more closely represents the population value. See my comment on this answer for how bootstrapping in one case matched the known bias in an estimator for the Shannon entropy.

Getting reliable confidence intervals for a parameter that generalize to the population is thus not always so simple as doing a large number of bootstraps, calculating the estimate of the parameter of interest for each bootstrap sample, and (for 95% CI) finding the 2.5th and 97.5th percentiles of those estimates. For example, with a sufficiently biased estimator of a parameter those values might not even include the observed point estimate in the data set. See this answer for some of the issues involved in estimating confidence intervals via bootstrapping. You might consider using a more sophisticated estimate of the confidence intervals that takes bias and skew into account, like the BCa method implemented for example in the R boot package.

With 2000 individuals contributing to each correlation coefficient estimate, you will not face a bias problem if your data are bivariate normal for each of the 21 pairs of tests. My guess is that you won't face a bias problem even with other joint data distributions. You might, however, face a problem with skew in the estimates, as the magnitude of the correlation coefficient can't exceed 1. So you should be aware of and test for such possibilities.

The next step

Although you claim to be a "statistics noob," I suspect that you have already figured that you want to cluster the 7 tests in a way that demonstrates how related their results are to each other. Simple repeat clustering on multiple bootstrap samples does not provide reliable p-values for the clustering. This answer introduces the issues and provides links to the literature. One approach is to take advantage of information provided by resampling not just at the scale of the original data sample (2000 in your case) but also at scales somewhat above and somewhat below that. The pvclust package in R implements this approach for hierarchichal clustering.

Bootstrapping CI for correlation coefficients; effect of averaging

1 Answers1