I feel I am pretty good on the mathematical basis of CLT and sampling distributions.
HOWEVER:
While sources like OpenIntro Statistics (Diez et al, 2019) are fairly straighforward with their actual use of sampling distributions, I am unable to find a single instance of a statistical analysis where a sampling distribution would be built and used as the underlying data. There are plenty of computational proofs that CLT and/or the sampling distributions do what they are supposed to be doing, but nothing to extent of "We have this data, we build a sampling distribution, we do these tests on it".
The questions are as follows:
- Are sampling distributions of the mean/proportion/etc used in real life or are they just a basis for the assumption of normality.
- For example, if you have data of the entire population and it is not normaly distributed (say, very bimodal), do you build a sampling distribution for the statistic of interest and work with it, or are non-parametric statistics become the only choice? - If sampling distributions ARE used in real life analyses, how are the sample sizes and the number of iterations selected. What stops me from taking a million large samples and reducing variance to the point where p-values become miniscule?