Deriving the average and variance of number of runs in runs test

Question

NIST states that the average and variance of a number of runs in the "runs above and below median" test as $$ \overline{R} = \frac{2n_a n_b}{n_a + n_b} + 1 \\ s^2 = \frac{2n_a n_b(2n_a n_b - n_a - n_b)}{(n_a+n_b)^2 (n_a+n_b-1)} $$ where $n_a$ is the number of values above the median and $n_b$ is the number of values below the median.

How is this result proved? Actually, more importantly, how do I begin to think about this problem?

It seems to me that this result might actually be valid more generally for "runs above/below threshold", since most likely $n_a = n_b$ for runs above median.

Your should define your symbols in case the page itself becomes inaccessible. Is $n_a$ the number above the median? — Glen_b, Oct 14 '19 at 03:56
note that the formula for the expected number of runs - E(R) - that you have is not for the total number of runs above, but runs both above and below. You may find the explicit example of counting arrangements conditional on the two $n$'s [here](https://stats.stackexchange.com/questions/144598/one-sample-run-test-but-with-p-neq-frac-12/144625#144625) of some value in building some intuition. If you're just counting runs above, you will have a lower expectation. You may also find the reference at the link of some help (assuming you can locate it). — Glen_b, Oct 14 '19 at 05:08
You still have the problem that the thing you linked to counts both kinds of runs (above and below) but your question text still only claims to be counting one kind (above). These are *different tests*. The formula you give applies to a different test to what your text says. Please fix the discrepancy one way or the other, so that they are both about the same thing. — Glen_b, Oct 15 '19 at 02:58

Deriving the average and variance of number of runs in runs test

0 Answers0