1

Given an arbitrary discrete distribution and an observed distribution coming from a Monte Carlo simulation, my goal is to be able to say whether or not the observed distribution is the same is the given distribution such that I correctly identify a distribution which is different with probability greater than 1 in 10,000. I should say that the Monte Carlo simulation is run at 10^10 iterations.

So far, I have been using a combination of confidence intervals (derived from the given distribution) to test the observed mean as well as the chi-square test. Originally, it seemed to me that this combination would provide, over multiple iterations, the desired precision for which I am aiming. Upon further thought, however, it has occurred to me that these tests are not independent since I am using them on the same observation. Consequently, I can only do as well as the most precise test. Is this true? I have several other questions:

Is my interpretation of confidence intervals correct when I say, when using a 95% confidence interval (for instance), that the probability that the observed mean falls outside of the confidence interval when the observed distribution is the same as the given distribution is 5%?

Similarly, if the above is correct, is the interpretation of the chi-square result similar? In other words, with an alpha value of 0.05, is it true that the probability that the chi-square result falls outside of this bound when the observed distribution is the same as the given distribution is 10%?

Finally, is there a good method for achieving my desired precision?

Thank you in advance!

cyrus1.618
  • 71
  • 1
  • 5
  • Concerning confidence intervals, please consult http://stats.stackexchange.com/questions/6652/what-precisely-is-a-confidence-interval. (In your statement, the event "the observed mean falls outside of the confidence interval" usually occurs with 0% probability.) – whuber Mar 21 '12 at 20:12
  • I've read this, and this is actually one reason I am asking this question. I want to test my own understanding to see whether or not it is correct. I do not see how it occurs with 0% probability since the confidence interval is derived from the given distribution and no the observed distribution. Can you please elaborate? Thanks – cyrus1.618 Mar 21 '12 at 20:14
  • Please read the thread I referenced. (Confidence intervals are derived from *observations*. You refer to a "given distribution" as reference, but constructing a confidence interval relative to that reference is not an appropriate way to compare additional distributions to the reference distribution--a confidence interval is an answer to a completely different question than that.) – whuber Mar 21 '12 at 20:18
  • I have reread this thread. Before I continue, I must note that, although the larger question rests on an understanding of confidence intervals, I do not want this thread to be focused on confidence intervals. That being said, I would like to know what is wrong in taking the approach of viewing the confidence interval around the given mean as expected variability of the mean? Thanks – cyrus1.618 Mar 21 '12 at 20:37
  • The CI of a sample is intended to cover a parameter. In your case, you appear to want to construct some kind of interval (of distributions?) based on your "observed distribution" that should have a good chance of covering future *observations,* not a parameter. This puts you in prediction interval territory: you have to account both for variability in the observed distribution *and* for variability in the future distributions to which it will be compared. That's a crucial difference. I suspect, too, that you intended to write something like 9999/10000 rather than 1/10000, right? – whuber Mar 22 '12 at 02:19
  • I did intend to write 9999/10000. Thanks for catching that. Since I can't really account for variability of future distributions (I am testing for error here, and consequently can not predict what effect an unknown error will have on a distribution), I suppose this puts me at a bit of a disadvantage. It seems to me that at the very least, however, I should be able to minimize the probability of false negative to the desired precision. Is this possible. If this does lie in the area of prediction interval territory, do you have any good references or ideas that I may be able to follow up? – cyrus1.618 Mar 22 '12 at 16:37

0 Answers0