I was thinking of algorithms for statistically determining the median of a large data set. To estimate the complexity of my idea I need the following Probability:
- I have a dataset $\left\{x_i \right\}$ of size $n$ with unknown distribution
- I take a random sample $\left\{y_i \right\}\subset \left\{x_i \right\}$ of size $s\ll n$
- I determine the $\left(50+\varepsilon\right)$th and the $\left(50-\varepsilon\right)$th quantile $y^{0.5\pm \frac{\varepsilon}{100}}$
What is the probability of the median $x^{0.5}$ of the first data set $\left\{x_i \right\}$ being in the interval $\left[y^{0.5- \frac{\varepsilon}{100}},y^{0.5 + \frac{\varepsilon}{100}} \right]$ as a function of $n,s,\varepsilon$?
Thanks in advance for any help