Consider a sorted sample $\vec{x}(n)$ with size $n > 100$ ($n$ is odd) from some rv $X$. Let $m$ denote the median of $X$. Let $R(\vec{x}(n))$ be the largest index $i \in [n]$ s.t. $x_i \leq m$. What is the distribution of $R(\vec{x}(n))$?
I was able to determine empirically by running the experiment multiple times that $R(\vec{x}(n))$ is normally distributed with $\mu = \frac{n}{2}$ and $\sigma^2 = \frac{n}{4}$, however I was not able to prove that. I'm looking for a proof.
Here's the code I used for the experiments:
def r(chosen, median):
chosen.sort()
if chosen[0] > median:
return 0
index = 1
for first, second in zip(chosen, chosen[1:]):
if second > median:
return index
index += 1
return index
data = range(50000)
median = np.median(data)
n = 101
indices = []
for experiment in range(50000):
chosen = random.choices(data, k=n)
indices.append(r(chosen, median))
_, _ = plt.subplots(figsize=(16, 6))
sns.histplot(data=indices, bins=40, kde=True)
plt.show()
print('Empiric mean: {}'.format(np.mean(indices)))
print('Formula mean: {}'.format(n/2))
print('Empiric std: {}'.format(np.std(indices)))
print('Formula std: {}'.format(math.sqrt(n)/2))