The Central Limit Theorem in Quantile Estimation

Question

I don't have a very strong background in statistics so I have a few conceptual questions and there is a strong possibility I'm missing something obvious.

Suppose I'm interested in estimating the 99th percentile of body weight in the United States and I have data from every town in every state. I could simply aggregate all the data and find the 99th percentile, but I'm not sure what statistical power this number would have. The data aggregated all together isn't really normally distributed. It has a very large kurtosis excess, so I don't believe any kind of confidence interval estimate would hold. But perhaps I'm missing a basic statistical concept.

The second option would be to find the 99th percentile in each town, and apply the central limit theorem, treating each town as an independent random variable. I know the CLT applies to other quantiles, but it doesn't hold for extreme quantiles. I have used MATLAB simulations to prove to myself that it will hold for the 0.99 quantile. However, body weight in each town is not identically distributed. You can imagine that low income, urban, areas will have heavier individuals. So I must apply the Lyapunov or Lindeberg-Feller CLT. Is this a valid thing to do? It seems like these Central Limit Theorems are making statements about the distribution of a random variable divided by the sum of variances, rather than a mean or quantile. How would confidence interval estimates change for these theorems? Any references or insight is greatly appreciated. Thanks.

What kind of data do you have from "every town in every state"? Does this mean you have a complete census of the country? Or perhaps you have a stratified random sample? Or maybe you have a census only of urban areas but not rural areas? Or possibly samples of urban areas only? — whuber, Jul 25 '13 at 17:48
It seems like "aggregating all the data and find[ing] the 99th percentile" is exactly what you would want to do to 'estimate' the 99th percentile of body weight in the US. Unless I am missing something, this is, by definition, not an estimate, but the actual value. — shabbychef, Jul 25 '13 at 17:55
I should clarify that this is all hypothetical data. The real problem is similar but much more complicated. Say I have a sample of people from all different areas, but not data on every person. So by just computing the 99th percentile. I wouldn't have the actual value because I only used a sample of the population to compute the value, but perhaps I am missing something. — user27606, Jul 25 '13 at 19:04
Is your sample a random sample of the whole population, or, if not do you have sampling weights? — user603, Sep 12 '15 at 12:39

The Central Limit Theorem in Quantile Estimation

0 Answers0