I am doing a Monte Carlo sampling from the null hypothesis in order to estimate a p-value of my data. Normally I would do, say, 1000 Monte Carlo runs, count the number of runs where I get values of my statistic at least as extreme (let's say 3), and report a p-value of p=0.003. However, in this particular case all Monte Carlo runs are very far away from my actual value, so I am pretty confident that the p-value is many orders of magnitude smaller than 0.001.
Can I do the following: calculate the mean $F_0$ and standard deviation $\sigma$ across the values of statistic obtained with Monte Carlo runs; normalize the deviation of my actual statistic $F$ (calculated with the real data) by computing a z-score $z=(F-F_0)/\sigma$; report the z-score as the measure of how extreme my data are? Note that I am getting extreme values (like z=40), so I would like to adopt the language of high energy physics and report that my statistic is "40 sigmas away from expected under the null hypothesis" instead of converting z=40 to an astronomically small p-value (a standard software like Matlab would simply give p=0 in this case).
I guess this only makes sense if one can assume that the null statistic distribution is Gaussian. Looking at my 1000 Monte Carlo runs, the histogram does look pretty much Gaussian, but I am not sure it will withstand a formal normality test.
What is the common practice in such cases? Is there one?
Update. Whuber gave a useful reference in the comments; it seems that the appropriate search terms are "rare events" and "importance sampling". I googled for that and briefly looked through a couple of papers. I am still very confused why nobody even discusses the method I suggested above. If this procedure is inappropriate because one should not rely on the normality assumption, then I don't understand why it is considered OK to rely on it when doing e.g. a t-test. Here is an example. LHC people measure their Higgs statistic, calculate standard error and report that the results are 5 sigmas away from the no-Higgs value. This is fine. But my procedure as outlined above is allegedly not fine. What is the important difference then?