Question
Let's say I have some sample, of size N, with observations ($y_i$ with $i=1,...,N$). And I have some statistic based on the observations (say $S = f(y_1, ..., y_N)$).
I would now like to estimate the variability of S, given some sample size n ($V_n(S)$), where $n \leq N$.
For example, to know which $N^* << N$ I can choose so to get some level of precision.
Idea
If I had wanted to estimate $V_n(S)$ for N ($V_N(S)$), for some general statistic, I could use bootstrap or jackknife.
But how would I estimate this value for smaller values of n?
One direction I had was to have repeated B (bootstrap) samples from our N observations, each of size n, and then estimate $V_n(S)$.
But this raises several questions for me:
- what do I gain from using samples with replacement? Why not take B samples, each of size n, without replacement? (for $n = N-1$ this would just be similar to the jackknife) What is a good way to decide if there is "substantial" benefit to using samples with replacement here?
- If the statistic was something like the mean, we would know the exact formula of the standard error. And we would just estimate the variance and then divide it by n, for each sample size we care about. Wouldn't that be better also for the general case if we knew that it depends on n? I.e.: why not estimate the $V_N(S)$ for the full sample size, then multiply it each time by $\frac{N}{n}$ to get the relationship we care about? Is there a known mapping of statistics that would depend on 1/n linearly (e.g.: the variance of the mean, the variance of the variance), vs ones that are not? (such as the variance of the max)
Any thoughts?