When performing a two-tailed test on a binomial process with p=.5, why are my p-values all over the place, and I virtually never reject the null hypothesis?
I answered a question on stackoverflow about testing a random process (i.e., that a function designed to yield a binomial random variable was working correctly). I got output like this:
can reject the hypothesis that the following tests are NOT the
results of a binomial process (with their given respective
probabilities) with probability < .01, 1000000 trials each
p = 0.01 {False: 10084, True: 989916} 4.94065645841e-324 reject null
p = 0.1 {False: 100524, True: 899476} 1.48219693752e-323 reject null
p = 0.33 {False: 100633, True: 899367} 2.96439387505e-323 reject null
p = 0.5 {False: 500369, True: 499631} 0.461122365668 fail to reject
p = 0.66 {False: 900144, True: 99856} 2.96439387505e-323 reject null
p = 0.9 {False: 899988, True: 100012} 1.48219693752e-323 reject null
p = 0.99 {False: 989950, True: 10050} 4.94065645841e-324 reject null
My p-values are always all over the place for p = 0.5, but fairly consistent (almost too consistent) on the others. I'm sure if I went back into the math, I could prove to myself why this is, but it's been too long since I've studied this, and I just don't have the time right now. So who can tell me why this is, or if perhaps I've made a mistake here.
Conclusion
I will consider this fully answered: it appears the p=.5 is the null hypothesis for the test I was performing. So, semantically my problem now is how do I test a null p != .5 (i.e. two-tailed)? To those who need a analogy, say I want to test a fair coin. I have found a specific answer on the site addressing this new problem of mine. Here's the link: Testing if a coin is fair