I am wondering whether the distribution of different parameters (quantiles, min, max) of a dataset in different distribution (like normal and exponential distribution) follows the distribution of the dataset. I plot a histogram and did not see any consistent pattern between different datasets. Any help would be appreciated

- 95,027
- 13
- 197
- 357

- 11
- 1
-
1You are looking for [order statistics](https://en.wikipedia.org/wiki/Order_statistic). The Wikipedia entry gives you the result for an underlying exponential distribution, and the proposed duplicate treats an underlying normal distribution. – Stephan Kolassa Sep 21 '20 at 15:04
-
Does this answer your question? [Approximate order statistics for normal random variables](https://stats.stackexchange.com/questions/9001/approximate-order-statistics-for-normal-random-variables) – Stephan Kolassa Sep 21 '20 at 15:04
1 Answers
Let $X_1, \dots, X_n$ be a random sample from a continuous distribution with density function $f(x)$ that is continuous and nonzero at the $p$th percentile $x_p$ $(0 < p < 1).$ If $k/n \rightarrow p$ (with $k-np$ bounded), then the sequence of order statistics $x_{k:n}$ is asymptotically normal with mean $x_p$ and variance $c^2/n,$ where $c^2 = p(1-p)/[f(x_p)]^2.$ [From Bain & Englehardt, 1992, 2e, Duxbury, p244.]
So for 'nice' distributions (with no discontinuities or 0-gaps) such as exponential or normal there is a "Central Limit Theorem" for quantiles (except the max and min).
In particular, the median of a moderately large sample from $\mathsf{Norm}(\mu,\sigma)$ is approximately normal. [With 100,000 iterations results should be accurate to about 2 significant digits, but $n=100$ is too small for perfect convergence of results being simulated.]
set.seed(912)
h = replicate(10^5, median(rnorm(100)))
mean(h); sd(h)
[1] 0.0006078384
[1] 0.1243622
And for an exponential population with mean 1 (median $log(2)=0.6931472).$
set.seed(912)
H = replicate(10^5, median(rexp(100)))
mean(H); sd(H)
[1] 0.6982845
[1] 0.09972337
Because of skewness, medians of exponentials converge to the (symmetrical) normal distribution somewhat more slowly.
par(mfrow=c(1,2))
hist(h, prob=T, col="skyblue2", main="Medians of Normal")
curve(dnorm(x, mean(h), sd(h)), add=T, col="red")
hist(H, prob=T, col="skyblue2", main="Medians of Exponential")
curve(dnorm(x, mean(H), sd(H)), add=T, col="red")
par(mfrow=c(1,1))

- 47,896
- 2
- 28
- 76