0

Suppose we have constructed a model of some stochastic system; we are also able to perform Monte Carlo simulations of this system. Now, we have two sets of samples: one from our model and one from the MC-based approach, and we would like to assess the accuracy of our technique.

The first two accuracy measures, which come to my mind, are the differences of expectations and variances. Then, one can try to construct the empirical PDFs/CDFs and compare them at certain points, e.g., find the RMSE.

What are the most preferable and exhaustive ways to draw this comparison? Which metrics are the first ones to look at, which are a must?

jonsca
  • 1,790
  • 3
  • 20
  • 30
Ivan
  • 458
  • 1
  • 4
  • 17

1 Answers1

1

It would help if we had some more information about your situation, for example, what kind of data do you have and what are your goals? The description is a little bit vague to me; I've heard of some areas where this sort of thing is done, so it sounds familiar, but I don't work there, so I can't necessarily fill in the blanks well. It is likely that someone else might be better suited to provide the information you need, but in the interim, I can say a little about how two distributions can be compared:

  1. You may want to visualize these distributions. One possibility is to do a qq-plot. Typically, qq-plots are thought of as a way to compare a sample distribution to a theoretical one (for example, your data against a normal distribution), but they can be used to compare two sample distributions (such as observed data versus simulated data). In R, for instance, this can be done with ?qqplot. QQ-plots are more common, because they afford better resolution at the tails, which is often what people care about, but you might be interested in a pp-plot as well, which makes it easier to see deviations in the middle of the distributions.
  2. For a test of whether these distributions are sufficiently similar, you might want to explore the Kolmogorov-Smirnov test. It checks a measure of distance between the cumulative distribution functions of the two samples against against a null distribution of how much such distances would vary if both samples were drawn from the same population. In R, this can be done with ?ks.test.

You may also be interested in reading this CV question: testing-data-against-a-known-distribution.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • I described the problem in such an abstract way because I thought it was something applicable to any situation regardless of the background. To be honest, I still believe so; although, no doubts the more information one has, the better results s/he can produce. I was looking for some best practices, just like the ones that you have shared. Thank you. – Ivan Aug 21 '12 at 05:07