I'm trying to evaluate performance of a metric learning model. The model that takes labelled image inputs and maps them to vectors on an N-dimensional unit sphere. The goal of the model is to map images to vectors such that same-labelled images map to vectors that are close together by dot product, and such that different-labelled images map to vectors that are far apart by dot product.
To evaluate the performance of this model, I'm running the model over a corpus of held-out data, and I'm interested in quantile statistics on the distribution of dot products between pairs of output vectors. Questions of interest are things like:
- What is the threshold $t$ such that 99% of equal-labelled pairs of vectors have a dot product greater than $t$?
- What is the threshold $t$ such that 99% of unequal-labelled pairs of vectors have a dot product less than $t$?
My evaluation dataset is small enough that I can directly compute dot products for all matching pairs of inputs, but there are enough non-matching pairs of inputs that it's very expensive to compute dot products for all non-matching pairs. Instead, for non-matches I'm estimating by taking a random sample from the space of non-matching pairs. This obviously introduces some error, but that's acceptable as long as the error bounds are well-understood. The problem is that it's not clear how to choose an appropriate sample size, nor is it clear how to characterize the error bounds for a given sample size.
Thus, my main question is this: If I want to estimate a quantile $q$ of a population by taking a sample of size $N$, is there a simple way to compute $N$ such that my quantile estimate is within a desired error bound? Alternatively, is there a better way to estimate (error bounds of) quantile statistics via sampling?
For my specific application, it seems likely that there are ways to take advantage of the specific structure of the samples (e.g., the fact that they're dot products between pairs of vectors from a well-known set). I'm interested in answers that take advantage of that structure, but I'm also interested in the more general question of how to think about estimating quantile statistics from an unknown distribution.