1

I am wondering whether the distribution of different parameters (quantiles, min, max) of a dataset in different distribution (like normal and exponential distribution) follows the distribution of the dataset. I plot a histogram and did not see any consistent pattern between different datasets. Any help would be appreciated

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
Paria
  • 11
  • 1
  • 1
    You are looking for [order statistics](https://en.wikipedia.org/wiki/Order_statistic). The Wikipedia entry gives you the result for an underlying exponential distribution, and the proposed duplicate treats an underlying normal distribution. – Stephan Kolassa Sep 21 '20 at 15:04
  • Does this answer your question? [Approximate order statistics for normal random variables](https://stats.stackexchange.com/questions/9001/approximate-order-statistics-for-normal-random-variables) – Stephan Kolassa Sep 21 '20 at 15:04

1 Answers1

1

Let $X_1, \dots, X_n$ be a random sample from a continuous distribution with density function $f(x)$ that is continuous and nonzero at the $p$th percentile $x_p$ $(0 < p < 1).$ If $k/n \rightarrow p$ (with $k-np$ bounded), then the sequence of order statistics $x_{k:n}$ is asymptotically normal with mean $x_p$ and variance $c^2/n,$ where $c^2 = p(1-p)/[f(x_p)]^2.$ [From Bain & Englehardt, 1992, 2e, Duxbury, p244.]

So for 'nice' distributions (with no discontinuities or 0-gaps) such as exponential or normal there is a "Central Limit Theorem" for quantiles (except the max and min).

In particular, the median of a moderately large sample from $\mathsf{Norm}(\mu,\sigma)$ is approximately normal. [With 100,000 iterations results should be accurate to about 2 significant digits, but $n=100$ is too small for perfect convergence of results being simulated.]

set.seed(912)
h = replicate(10^5, median(rnorm(100)))
mean(h);  sd(h)
[1] 0.0006078384
[1] 0.1243622

And for an exponential population with mean 1 (median $log(2)=0.6931472).$

set.seed(912)
H = replicate(10^5, median(rexp(100)))
mean(H);  sd(H)
[1] 0.6982845
[1] 0.09972337

Because of skewness, medians of exponentials converge to the (symmetrical) normal distribution somewhat more slowly.

enter image description here

par(mfrow=c(1,2))
 hist(h, prob=T, col="skyblue2", main="Medians of Normal")
  curve(dnorm(x, mean(h), sd(h)), add=T, col="red")
 hist(H, prob=T, col="skyblue2", main="Medians of Exponential")
  curve(dnorm(x, mean(H), sd(H)), add=T, col="red")
par(mfrow=c(1,1))
BruceET
  • 47,896
  • 2
  • 28
  • 76