When you get confused about bootstrapping, go back to its basic principle:
Taking a bootstrap sample from your data set is like taking a new data set from the population.
This has implications in three ways for your study. (I assume that determining pairwise correlation coefficients between pairs of tests is an appropriate way to estimate the relationships you care about.)
How to bootstrap
Your proposal 1 is to bootstrap-sample among trials, keeping all 2000 individuals in each bootstrap sample. I suppose that would tell you something about the contribution of the variance among trials to the results, but is that what you really care about? A similar argument applies to the proposal in a comment to sample from all 630,000 data points; each bootstrap sample would contain with very high probability at least some trials for each individual. That proposal would, however, run the risk of not having results for all 7 tests for each individual in each bootstrap sample. Neither seems to be what you want.
The variability you presumably care about is the variability among members of the population from which you took the original data set. So you should bootstrap-sample among the 2000 individuals. If your procedure was to average trials within each test for an individual, proceed the same way with your bootstrap samples. That is your proposal 2. Go with that.
Parameter estimates and confidence limits from bootstrapping
If each bootstrap sample relates to your data set as your data set relates to the population, then an issue arises if data-based estimates of your parameter of interest are biased with respect to the population value. With small samples from a bivariate normal distribution, estimates of the correlation coefficient are biased. Bootstrapping is a useful way to estimate that bias, providing a correction you can apply to the estimate from the data set that more closely represents the population value. See my comment on this answer for how bootstrapping in one case matched the known bias in an estimator for the Shannon entropy.
Getting reliable confidence intervals for a parameter that generalize to the population is thus not always so simple as doing a large number of bootstraps, calculating the estimate of the parameter of interest for each bootstrap sample, and (for 95% CI) finding the 2.5th and 97.5th percentiles of those estimates. For example, with a sufficiently biased estimator of a parameter those values might not even include the observed point estimate in the data set. See this answer for some of the issues involved in estimating confidence intervals via bootstrapping. You might consider using a more sophisticated estimate of the confidence intervals that takes bias and skew into account, like the BCa method implemented for example in the R boot package.
With 2000 individuals contributing to each correlation coefficient estimate, you will not face a bias problem if your data are bivariate normal for each of the 21 pairs of tests. My guess is that you won't face a bias problem even with other joint data distributions. You might, however, face a problem with skew in the estimates, as the magnitude of the correlation coefficient can't exceed 1. So you should be aware of and test for such possibilities.
The next step
Although you claim to be a "statistics noob," I suspect that you have already figured that you want to cluster the 7 tests in a way that demonstrates how related their results are to each other. Simple repeat clustering on multiple bootstrap samples does not provide reliable p-values for the clustering. This answer introduces the issues and provides links to the literature. One approach is to take advantage of information provided by resampling not just at the scale of the original data sample (2000 in your case) but also at scales somewhat above and somewhat below that. The pvclust package in R implements this approach for hierarchichal clustering.