What are the basic affine transformations on a distribution for the various moments?

Question

To change a distribution's mean, we do the translation affine transformation. E.g. add a constant to every data point.

To change a distribution's variance, we do the scale affine transformation. E.g. multiple every datapoint by a constant.

Are there similarly canonical affine transformations for kurtosis, skew, and high order moments?

An affine transformation allows you to change only two moments (not necessarily the first two), basically because it gives you two coefficients to play with (I assume we're on the real line). If you want to change more than two moments you need a transformations with more than two coefficients, hence not affine. — pglpm, Jun 06 '20 at 19:05

pglpm · Answer 1 · 2020-06-08T06:33:51.290

Not affine, but polynomial transformations for example. Basically this is necessary because you want to change several numbers at the same time, and an affine transformation only gives you two coefficients to "play with" (I assume we're speaking about distributions on the real line).

If you consider a coordinate transformation $y=f(x)$, a density function $\mathrm{p}(x)$ changes in such a way that $\mathrm{p}(x)\,\mathrm{d}x = \mathrm{p}(y)\,\mathrm{d}y$, where $\mathrm{p}(y)$ is the density function in the new coordinates (related to the old one by a Jacobian determinant).

Denote the $n$th raw moment in $y$-coordinates as $m_n := \int y^n\,\mathrm{p}(y)\,\mathrm{d}y$, and in $x$-coordinates as $M_n := \int x^n\,\mathrm{p}(x)\,\mathrm{d}x$. Take an affine coordinate transformation $y=ax+b$. Consider the third raw moment in $y$-coordinates and pass to $x$-coordinates: \begin{align} m_3&=\int(ax+b)^3\,\mathrm{p}(y)\,\mathrm{d}y\\ &= a^3\int x^3\,\mathrm{p}(x)\,\mathrm{d}x +3a^2b\int x^2\,\mathrm{p}(x)\,\mathrm{d}x +3ab^2\int x\,\mathrm{p}(x)\,\mathrm{d}x +b^3\int\mathrm{p}(x)\,\mathrm{d}x \\ &= a^3 M_3 +3a^2b M_2 +3ab^2 M_1 +b^3 \end{align}

Imagine to write similar equations for $m_2$ and $m_1$. You get a system of three equations giving you $\{m_1, m_2,m_3\}$ as polynomials of $\{M_1,M_2,M_3\}$, with the coefficients involving $a$ and $b$. If you fix all six moments, you may be able to choose the coefficients so that two equations are satisfied, but in general the remaining one won't be. This means that you need a transformation involving more than two coefficients. (You may still not be able to find a solution though; this is related to the moment problem.)

Otherwise, an everywhere-positive density function can be transformed into any other everywhere-positive density function by an appropriate coordinate change, so in principle you can change the moments in any way you please (within limits related to the moment problem).

Regarding the other part of your question, I don't know of any canonical transformations for skewness and kurtosis, but maybe there are.

This is a tricky and fascinating topic. The fact is this: from "the point of view of the distribution", which is a measure, the manifold upon which it's defined doesn't need to have any additional structure (just a measurable space). For us to speak about a mean, the space needs to have an additional convex structure (which locally implies an affine one). With such a convex structure we can't speak about a second moment or variance. To speak about a variance, the space needs to have a quadratic form defined on it (quadratic forms can be defined on a convex space, even if it isn't a vector space; they have just slightly different properties from forms on a vector space). And so on. So usually the space has some additional structure that makes sense in the specific problem. The transformations we consider have to somehow be compatible with that structure to make sense.

This is also why we standardize the first and second moments of a normal distribution, but don't standardize higher moments: we could, with an appropriate transformation, but the distribution would not be a normal distribution anymore (that is, one belonging to the normal family). And how can we say that a distribution is a "normal"? We need (1) an affine structure and (2) a quadratic form on that space. [I hope this makes sense to you, sorry maybe I'm being too concise.]

Thanks pglpm! This also explains why shifting mean/variance is so popular in ML operations like Batch Norm and Adaptive Instance Norm: mean and variance are only two moments, so it's changeable with a simple affine transform. Are there polynomial transformations that would be considered canonical for skew/kurtosis? For kurtosis, I could imagine a transformation t(x) = x**2. Dots close to zero get closer; dot far away get further. Dots exactly at 1.0 don't move. For skew... t(x) = x*log(x)? — Yaoshiang, Jun 07 '20 at 19:27
You're welcome! I've added some info to my answer regarding that. I don't know of any, but they may well exist. You notice that if you use a 3rd order polynomial, it'll touch the *6th* moments. There's also a matter of which transformations "make sense" in your space. Your question touches a very fascinating topic. — pglpm, Jun 07 '20 at 21:01
@Yaoshiang Have a look at LambertW transformations . e.g. [What's the distribution of these data?](https://stats.stackexchange.com/questions/33115/whats-the-distribution-of-these-data/47917#47917) — Georg M. Goerg, Jun 08 '20 at 11:26

What are the basic affine transformations on a distribution for the various moments?

1 Answers1