how to generate random data based on simple statistical meassures

Question

0

I currently have a test data set that has 500k data points. I have an algorithm that process that data and returns some information. In order to establish the statistical significance of the results Id like to run a monte carlo simulation. I would do this by taking the:

Kurtosis
Std deviation
Mean
Skewness

And generating a series of randomized data sets, on which I would run my algorithm again.

How would I generated a data-set with the same number of data points that have the exact same kurtosis std deviation mean and skewness?

Related: [How to simulate data that satisfy specific constraints such as having specific mean and standard deviation?](https://stats.stackexchange.com/q/30303/7290) — gung - Reinstate Monica, Aug 13 '19 at 15:45
Here are three R packages for simulating data with specified distributions and relationships: * [SimCorrMix](https://cran.r-project.org/web/packages/SimCorrMix/SimCorrMix) * [SimMultiCorrData](https://github.com/AFialkowski/SimMultiCorrData) * [simrel](https://simulatr.github.io/simrel/) — abalter, Aug 13 '19 at 18:54
See https://en.wikipedia.org/wiki/Pearson_distribution and, if the applicability to this question is not obvious, read the first paragraph under "history." — whuber, Aug 13 '19 at 20:17
Have you considered running a bootstrap? You can resample the data you already have and this should match the moments that you want. In fact, I think this is more credible than running a simulation because any simulation will make distributional assumptions that may change the performance of your algorithm. — ecnmetrician, Oct 30 '21 at 03:19

score 1 · Answer 1 · answered Aug 13 '19 at 15:18

1

If I understand you correctly, you assume a normal distribution (+ skew and kurtosis). If this is correct, you can use Fleishman's method. In R you can use the PoisNonNor package and for SAS der is also code available online. For further reading I recommend:

Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.

Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological methods, 17(3), 399.

answered Aug 13 '19 at 15:18

Mr Pi

1,315
1
7
15

2

How are you getting a normal distribution with skewness and kurtosis? – Dave Aug 13 '19 at 19:03
Maybe I just didn't express myself well, what I meant was a skewed normal distribution but I did not know what the adjective of kurtosis was, so I used the parenthesis – Mr Pi Aug 14 '19 at 06:44

score 0 · Answer 2 · answered Aug 13 '19 at 15:29

0

If you are able to find the cumulative distribution function of your event, you can then sample random events according to that distribution using inverse transform sampling

answered Aug 13 '19 at 15:29

HDLX

1

1

The question is unclear. Your statement is correct but it is not clear that the OP can determine the exact cdf. – Michael R. Chernick Aug 13 '19 at 15:59
while I believe I could in theory calculate the cdf, I'm looking for something simpler. As a clarification, this is to determine the statistical significance of a trading algorithm. I apply the strategy to some data and then I want to use montecarlo to determine the stat sig of the initial result, maybe my logic is flawed somewhere? – lucas rodriguez Aug 13 '19 at 16:05

how to generate random data based on simple statistical meassures

2 Answers2

Linked