5

I have a complex distribution which I can numerically sample.

I'd like to estimate a percentile (let's say 90%) using Monte-Carlo simulations. What I'm doing is:

  1. I run 1 million of independent simulations and I get 1 million of samples from the distribution.
  2. I order them in ascending order, and I take the one that it's in the 900000th position.

Assuming that this way of proceeding is correct, how can I theoretically estimate the number of runs so that the error in the estimation is roughly within i.e. 1%?

All I can say about the underlying distribution is that it's roughly analogous to a normal distribution divided by a chi distribution, if that's of any help.

xanz
  • 449
  • 2
  • 11
  • 1
    This is just a complicated way of asking about [binomial proportion confidence intervals](http://stats.stackexchange.com/search?q=binomial+proportion+confidence), so you are likely to find useful information in the linked search. – whuber Dec 24 '16 at 15:03
  • Why don't you edit this again taking out what is irrelevant and make it really clear what your problem is. – Michael R. Chernick Dec 24 '16 at 20:46
  • @MichaelChernick If I knew that those things were irrelevant I wouldn't have added them. By the way, I believe that what I'm asking is quite clear: how can I estimate the number of samples to use to estimate the inverse CDF at a given percentile within a given accuracy? – xanz Dec 24 '16 at 21:46
  • @whuber. Thank you. That's not exactly what I'm looking into but it's a step forward I believe. Basically I'd like to control the absolute error on the estimate, not on the corresponding percentile. – xanz Dec 24 '16 at 22:54
  • 1
    The question is rather vague, you don't seem to know the distribution, or it is so complicated that working with it analytically is not possible anyway. In these circumstances, why not just ... Do your simulation 100 times with $n = 10^6$ ... and see if the bulk (say 95%) of the answers lie within a 1% range (or whatever you require). If they do not, increase the number of simulations to $n=10^7$, and try again. If not, try $n=10^8$, etc If you never get there, you may need to relax your requirements. ........ – wolfies Dec 25 '16 at 16:45
  • I suppose the real question is: how can you sample if you don't know the distribution?, ... and if you do know the distribution, then there would be more efficient ways of generating the 90th percentile than generating 1million values, sorting them, and taking the 900,000th value. If you specify the pdf, your question will have bones – wolfies Dec 25 '16 at 16:51
  • What I'd like is to determine $n$ to estimate $F^{-1}(0.9)$ with a tolerance of 0.1% assuming I know nothing on the distribution. The tolerance should not be on the 0.9 (i.e. estimate a value that's between 89.9%-90.1%) - that's something I can achieve using a binomial proportion CI - but on the actual value (i.e. I want F^{-1}(0.9) to be within $\pm 0.1\%$ from the estimation. – xanz Dec 25 '16 at 17:40
  • 3
    Thank you for the clarification. You are asking for a *tolerance interval* for $F$. – whuber Dec 25 '16 at 23:34

1 Answers1

1

If the random variable under consideration is approximately the ratio of a normal to a chi, then you may just be able to use the t-distribution with appropriate degrees of freedom to approximate the quantile of interest.

If $Z \sim N(0, 1)$ and $Y \sim \chi^2_n$ and if Z and Y are independent, then it is well known that the ratio $T = \frac{Z}{\sqrt{Y/n}} \sim t_n$. If you approximately have the ratio of a normal to a chi with n degrees of freedom (independent), then it sounds like you approximately have a scaled t distribution. You can use properties of the t to estimate the quantile.

frelk
  • 1,117
  • 1
  • 8
  • 19
  • They are not independent, unfortunately – xanz Dec 24 '16 at 21:42
  • 1
    @freik wrote "They are not independent, unfortunately." Kind of an important detail to have left out of your statement "All I can say about the underlying distribution is that it's roughly analogous to a normal distribution divided by a chi distribution, if that's of any help." – Mark L. Stone Dec 24 '16 at 23:46
  • I said "roughly" on purpouse because the degree of dependency is very modest. But the accuracy I require on the estimates is very high. That's why I didn't add details, I wanted to know if there's a method that's generally valid for choosing $n$ regardless of the underlying distribution – xanz Dec 25 '16 at 08:20