1

0

I currently have a test data set that has 500k data points. I have an algorithm that process that data and returns some information. In order to establish the statistical significance of the results Id like to run a monte carlo simulation. I would do this by taking the:

  • Kurtosis
  • Std deviation
  • Mean
  • Skewness

And generating a series of randomized data sets, on which I would run my algorithm again.

How would I generated a data-set with the same number of data points that have the exact same kurtosis std deviation mean and skewness?

  • Related: [How to simulate data that satisfy specific constraints such as having specific mean and standard deviation?](https://stats.stackexchange.com/q/30303/7290) – gung - Reinstate Monica Aug 13 '19 at 15:45
  • Here are three R packages for simulating data with specified distributions and relationships: * [SimCorrMix](https://cran.r-project.org/web/packages/SimCorrMix/SimCorrMix) * [SimMultiCorrData](https://github.com/AFialkowski/SimMultiCorrData) * [simrel](https://simulatr.github.io/simrel/) – abalter Aug 13 '19 at 18:54
  • See https://en.wikipedia.org/wiki/Pearson_distribution and, if the applicability to this question is not obvious, read the first paragraph under "history." – whuber Aug 13 '19 at 20:17
  • Have you considered running a bootstrap? You can resample the data you already have and this should match the moments that you want. In fact, I think this is more credible than running a simulation because any simulation will make distributional assumptions that may change the performance of your algorithm. – ecnmetrician Oct 30 '21 at 03:19

2 Answers2

1

If I understand you correctly, you assume a normal distribution (+ skew and kurtosis). If this is correct, you can use Fleishman's method. In R you can use the PoisNonNor package and for SAS der is also code available online. For further reading I recommend:

Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.

Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological methods, 17(3), 399.

Mr Pi
  • 1,315
  • 1
  • 7
  • 15
  • 2
    How are you getting a normal distribution with skewness and kurtosis? – Dave Aug 13 '19 at 19:03
  • Maybe I just didn't express myself well, what I meant was a skewed normal distribution but I did not know what the adjective of kurtosis was, so I used the parenthesis – Mr Pi Aug 14 '19 at 06:44
0

If you are able to find the cumulative distribution function of your event, you can then sample random events according to that distribution using inverse transform sampling

HDLX
  • 1
  • 1
    The question is unclear. Your statement is correct but it is not clear that the OP can determine the exact cdf. – Michael R. Chernick Aug 13 '19 at 15:59
  • while I believe I could in theory calculate the cdf, I'm looking for something simpler. As a clarification, this is to determine the statistical significance of a trading algorithm. I apply the strategy to some data and then I want to use montecarlo to determine the stat sig of the initial result, maybe my logic is flawed somewhere? – lucas rodriguez Aug 13 '19 at 16:05