Let's begin with a simple case with a known answer:
Suppose we have samples $x_1,...,x_N$ drawn i.i.d. from a real-valued distribution $D$. A common first step in analyzing this data might be to plot the empirical CDF of the data.
The empirical CDF is useful because it tells us a little about the true CDF of $D$. For example, if we draw $N=101$ samples, it would seem like the 51st largest sample (i.e., the median of the empirical distribution) is probably a reasonable proxy for the actual median of $D$. This intuition can be made precise: as long as $D$ is a continuous distribution, then the distribution of the actual quantile value (in $D$) for any sample is beta-distributed as a function of $N$ and the position of the sample in sorted order. (Cf, e.g., an introduction to order statistics.) For the remainder of this question, please take all real-valued distributions as continuous.
These beta distributions provide a strong characterization of $D$ with minimal assumptions-- it's great both in theory and practice. I'm looking for a similar result when the distribution is hierarchical.
As an example, suppose we choose $N$ people. Each person has a distribution for how many hours they sleep each night, say $D_k$ for person $k$. We collect $S_k$ samples $x_{k,1},x_{k,2},...,x_{k,S_k}$ from person $k$. For instance, we may collect a single data point from person 1, but 100 data points from person 2.
If we weight by samples (i.e., ignore the fact that repeated samples come from the same person), we would recover the situation above. If we weight by people, the empirical distribution is clear: we put probability $1/(NS_k)$ on sample $x_{k,i}$. But how do these values relate to the underlying distribution of quantiles? (I.e., what plays the role of the beta distributions?)
There seem to be (at least) three ways to make this question precise, and I'd very much appreciate any advice on thinking about these. I'll present them in (I believe) increasing order of complexity. Incidentally, to avoid some technicalities, it maybe useful to assume that the distributions (in addition to being continuous) all share the same range.
In all cases below, when I ask to estimate an underlying distribution, I mean: for an observation $x_{k,i}$, we wish to compute the probability that $x_{k,i}$ corresponds to any given quantile.
Question #1: Assume that our entire population consists of these $N$ people. We are trying to estimate quantiles for: $$F=(1/N)\sum_{k=1}^N D_k$$
We can answer this problem as follows. First, we can compute the quantile distribution for each person at each sampled point using the machinery above. The distribution across quantiles is simply the uniform mixture of the $N$ beta distributions. Note that this also works with a weighted distribution across people: we simply replace $1/N$ with the appropriate weight.
Question #2: Assume that there is an underlying population of $M\gg N$ people, and we are trying to estimate: $$G=(1/M)\sum_{k=1}^M D_k$$
Question #3: We have a possibly infinite population of people, where person $i$ is associated with some real-valued distribution $D_i$. We sample $i$ (and thus $D_i$ from some distribution $P$ over distributions; we are trying to estimate the distribution $$H=E_{p\sim P} [D_p]$$
I would most like the answer to Question #3, but would settle for Question #2. Any help (or advice on reframing this question, if appropriate) would be greatly appreciated.