1

Let $X$ be a sample from $N(0,1)$ and $m$, $v$, $s$, $k$ denote sample mean, variance, skewness and kurtosis of $X$. I want to transform the sample $X$ such that the sample moments equal the true population moments, e.g.

  • sample mean = 0
  • sample variance = 1
  • sample skewness = 0
  • sample kurtosis = 3
  • ...

Using z-scores, $\frac{X-m}{\sqrt{v}}$, I can match the first two moments perfectly.

I seek a (nonlinear) transformation which helps my sample to match further population moments.

I found online the sinh-arcsinh transformation, that is $$Z=\sinh\left((4-k)\sinh^{-1}\left(\frac{X-m}{\sqrt{v}}\right)-s\right),$$

which should result in a match of the first four sample moments with the true population moments.

However, if I compare this transformation with the plain z-scores, $\frac{X-m}{\sqrt{v}}$, then that simpler approach yields better results (sample moments match population moments more closely). How can I transform the data correctly to match the moments?


Sinh-arcsinh-transformation:

Let $Z\sim N(0,1)$. Then, $$X=\mu+\sigma\sinh\left(\frac{\sinh^{-1}\left(Z\right)+\varepsilon}{\delta}\right)$$ has mean $\mu$, variance $\sigma^2$, skewness $\varepsilon$ and kurtosis $4-\delta$.

Alex
  • 313
  • 1
  • 9
  • Why not use your second equation with $\epsilon=0$ and $\delta=1$? – Dave May 16 '20 at 12:18
  • @Dave I thought the second equation requires standardised data as input? But the sample moments of my sample $X$ are not quite equal to 0, 1, 0 and 3. That’s the aim to get these ideal moments. – Alex May 16 '20 at 14:37
  • You cannot achieve this without some kind of nonlinear transformation. Please explain what family of transformations you are willing to consider. – whuber May 16 '20 at 21:48
  • @whuber I'm happy with _any_ nonlinear transformation. I generated a sample $X$ of $N(0,1)$ realisations. But of course the sample moments of $X$ are not precisely equal to the population moments. So I seek some transformation $f$ such that $f(X)$ has the ''right'' first four sample moments. I thought the (nonlinear) sin-arcsin transformation may work but its success is underwhelming. Do you have better ideas? – Alex May 16 '20 at 22:05
  • 1
    Unfortunately, your question isn't sufficiently specific. You can always transform a dataset to make it extremely close to any given distribution by means of the probability integral transform, but there are myriad other ways, too. – whuber May 16 '20 at 22:08
  • @whuber What additional information would be needed to improve my question? I just want to improve the quality of my sample of $N(0,1)$ realisations from ``randn`` (matlab). Could you perhaps elaborate on how such a transformation would look like? The probability integral transform merely states $F_X(X)\sim U(0,1)$, right? – Alex May 16 '20 at 22:22
  • I guess I could generate $U(0,1)$ random numbers and plug those numbers into the inverse cdf, $F_X^{-1}$, kind of ''inverse sampling''. But I do not see how this guarantees that the first few sample moments of that sample agree with the population moments. I am really only after an improvement for the random numbers I generated one way or the other. – Alex May 16 '20 at 22:26
  • It's unclear what you mean by "quality" of a sample and therefore the sense of "improve" is obscure. – whuber May 17 '20 at 11:02
  • @whuber I generate a sample of $N(0,1)$ realisations using ``randn``. That sample should have mean 0, variance 1, skewness 0 etc. But the sample moments differ slightly from those population moments. I want to ``improve'' that sample by matching its moments with the true population moments. E.g. when I use z-scores (subtract sample mean and divide by sample standard deviation), then the first two moments are matched perfectly. I wonder what transformation helps me to match higher sample moments to population moments from $N(0,1)$. The more moments match, the better the quality of my sample – Alex May 17 '20 at 15:27
  • In what sense is this an improvement, though? The adjusted sample will no longer have many of the properties considered desirable of a sample, such as independent observations, which leads us to wonder why you might want to do this. – whuber May 17 '20 at 15:55
  • @whuber I was taught this moment matching is a popular variance reduction technique when conducting Monte Carlo simulations. See here: Section 2.4 on page 10 (or 1276 in the document) http://web.math.ku.dk/~rolf/teaching/ctff03/BoyBroGlasJEDC.pdf That paper mentions that the transformed sample (using z-scores) may not be normally distributed anymore but that the created bias is typically small. – Alex May 17 '20 at 16:17

0 Answers0