2

I was thinking of algorithms for statistically determining the median of a large data set. To estimate the complexity of my idea I need the following Probability:

  • I have a dataset $\left\{x_i \right\}$ of size $n$ with unknown distribution
  • I take a random sample $\left\{y_i \right\}\subset \left\{x_i \right\}$ of size $s\ll n$
  • I determine the $\left(50+\varepsilon\right)$th and the $\left(50-\varepsilon\right)$th quantile $y^{0.5\pm \frac{\varepsilon}{100}}$

What is the probability of the median $x^{0.5}$ of the first data set $\left\{x_i \right\}$ being in the interval $\left[y^{0.5- \frac{\varepsilon}{100}},y^{0.5 + \frac{\varepsilon}{100}} \right]$ as a function of $n,s,\varepsilon$?

Thanks in advance for any help

Phteven
  • 21
  • 1
  • Given your objective, you will likely be interested in [The P2 algorithm](https://stats.stackexchange.com/questions/7959) and in [behavior of quantiles generally](https://stats.stackexchange.com/questions/45124). The latter provides both direct and asymptotic answers. – whuber Feb 13 '19 at 12:53
  • @whuber thanks for the comment. I am not working on a project on this but want to test an algorithm theoretically. There are two problems with what you suggest: 1. I am looking for a precise answer not an approximation (military loss function), so the P2 algorithm does not work 2. I want to draw one sample and get the probability in dependence of the size of this sample -> central limit theorem does not work – Phteven Feb 13 '19 at 14:34
  • I don't follow, because your questions asks for the probability that an interval will cover a median--on the face of it, that contemplates an "imprecise" answer. – whuber Feb 13 '19 at 14:35
  • consider a loss function where you get 1 if you hit the target exactly and 0 if you do not. i.e. i have to choose from the list of values rather than approximate. There is a chance I do not find it, but if I do it is the correct one – Phteven Feb 13 '19 at 18:17
  • That sounds like a very different question than the one you posted. Could you explain how they might be the same? – whuber Feb 13 '19 at 18:28

0 Answers0