Show that a Linear transformation does not change shape of a distribution

Question

It is easy to show how a linear transformation affects the mean or the variance of a distribution. It is easy to find over the internet that a linear transformation does not change the shape of a distribution.

My question is, why a linear transformation does not change the shape of a distribution? Is it an easy to show proof?

How do you define "shape"? *My* definition is that *the shape of a distribution is the collection of all its properties that are invariant under affine transformation.* With this definition, there's nothing to prove! — whuber, Feb 14 '22 at 17:55
you seem to be using a definition of shape that would, for example, say that a rectangle and a square are the same shape, or that an isosceles triangle and an equilateral triangle are the same shape, since in both examples the latter shape can result from an affine transformation of the former. — Alexis, Feb 14 '22 at 17:59
@Alexis I find that comment more than a little misleading, because this is neither affine nor Euclidean geometry, but you seem to want to make it so. Indeed, a "rectangle," as a plot of a continuous distribution density function, represents a *uniform distribution on an interval* and all such distributions do have the same shape. Perhaps stating the definition differently will help: *the shape of a distribution consists of everything you can deduce by looking at a plot of its density when the axes are not labeled.* — whuber, Feb 14 '22 at 18:03
@whuber Hmmm... the point of my comment was not to mislead, but to draw attention to the need for a definition of "shape" and using a geometric analogy. When you describe probability densities, I guess you are making some kind of commitment to the meaning of shape. I see we are both editing our comments in real time: that restating the definition is lovely, thank you! (I was struggling to articulate what you captured with "when the axes are not labeled.") — Alexis, Feb 14 '22 at 18:05
@Alexis The challenge consists of plotting probability measures on the real line. But the principle remains the same: any such plot (using a faithful, undistorted rendering of the line), when its labels are erased, reveals only the shape of the distribution, holding in abeyance the particulars of its location and spread. Use the graph of the CDF if you prefer. This is by far the best definition of distributional "shape" there possibly can be: all others are derivative or require irrelevant constructions, such as standardization (which isn't even always possible). — whuber, Feb 14 '22 at 18:08
BTW, half of all linear transformations *do* change the shape: we usually consider a reflection to be a change of shape. But that is a matter of convention and what your particular definition might be, I suppose. And, of course, multiplication by zero (a linear transformation) makes any distribution constant, thereby changing its shape if it wasn't already constant. — whuber, Feb 14 '22 at 18:12

score 2 · Answer 1 · answered Feb 16 '22 at 17:25

This question urges us to consider what the "shape" of a distribution might be. I would like to describe a perspective in which it is intimately connected with the meaning of "shape" in (elementary) geometry. It leads to an answer that requires no proof at all: it's all part of the definition.

The bottom line is this: two distributions on a geometrical space "have the same shape" when there is a geometrical transformation that converts one of the distributions into the other. Thus, the sense of "same shape" depends on what you consider a "geometrical transformation" to be. There are many reasonable and interesting choices, so ultimately the answer must depend on the context in which you are manipulating distributions.

In the following discussion, which is abbreviated and abstract, I include many practical, well-known examples to demonstrate the applicability and importance of these ideas. The examples can be greatly extended. Other common applications include distributions on Complex vector spaces; distributions in space $\mathbb R^3$ and high dimensions; and distributions on the circle, sphere, and their higher-dimensional analogs.

Because "shape" is a geometrical concept and "distribution" is a probabilistic or measure-theoretic concept, common ground is found within geometric spaces that might support distributions.

Geometry

The modern (post 19th century) conception of geometry is taught in Lang & Murrow's Geometry (a textbook written for early high school students). It is the study of a group of transformations acting on a set. The archetypical examples, analyzed by Euclid, are the actions of reflections and isotheties (uniform scalings) in the plane $\mathbb R^2$ and space $\mathbb R^3.$

For the present purposes, it suffices to characterize the situation this way:

A transformation is a one-to-one correspondence of the set elements with themselves.
Undoing any transformation in the group, followed by applying any other (potentially the same) transformation, is also in the group.

The elements of the underlying set are called "points" and, together with a group of transformation, deserve to be called a "geometric space."

A "geometric figure" is any set of points in a geometric space. The "shape" of a figure is the set of its properties that are not changed by any of the group transformations.

As examples,

The Euclidean group is generated by all the reflections in the plane. It includes all rotations about arbitrary points, as well as translations. We say two plane figures are "congruent" when one of them can be transformed to the other via some transformation in this group.
The group generated by all reflections along with all isotheties (uniform scalings) in the plane is the "similarity group." We say two figures are "similar" (that is, they have the same (Euclidean) shape) when one of them can be transformed to the other via some transformation in this group.

These are the two examples studied in detail in any high school plane geometry course.

Given a geometry, let's agree to say generally that two figures are congruent when one can be transformed to the other, as illustrated in these examples.

Distributions

A "distribution" on a set (which in our applications will support a geometric structure) consists of (1) a specified collection of figures, often called "events," and (2) an assignment of a number to each figure, called its "measure." Various familiar axioms permit us to deduce the measures of figures that are assembled from other figures by adjoining them or removing parts of them.

Transformations of the set induce transformations of distributions. When $g$ is a geometric transformation, $\mathbb P$ a distribution, and $\mathcal E$ is a figure, the transformed distribution assigns the value $\mathbb{P}(\mathcal E)$ to the figure $g(\mathcal E).$ Since the transformation is one-to-one, all figures can be written in the latter form, thereby fully defining the transformed distribution.

Putting the ideas together

Two distributions on a geometric space have the same shape when there is a transformation in the geometry that transforms one of the distributions into the other.

"Have the same shape" determines equivalence classes of distributions in which the value of a figure depends only on its shape.

Examples

Translation (and reflection) invariance on the line

One of the simplest geometric spaces is the real line $\mathbb R^1$ with the group generated by all reflections. Any reflection can be expressed numerically as a function of the form $$\mathcal{R}_h: x \to 2h-x$$ where the point $h\in\mathbb R$ is the "center" of the reflection (because clearly it is not moved). A reflection undoes itself. So, the result of any reflection followed by another must also be in the transformation group. Letting the two centers be $h$ and $k,$ we find the result of applying $\mathcal{R}_h$ and then $\mathcal{R}_k$ to be

$$\mathcal{R}_k \circ \mathcal{R}_h: x \to 2k - (2h-x) = x + 2(k-h).$$

These are all translations of the line. So, for instance, the intervals $[0,1]$ and $[2,3]$ are congruent via the translation by $+2.$ The unions $[0,1]\cup[2,4]$ and $[0,2]\cup[3,4]$ are congruent via the reflection $\mathcal{R}_2.$ However, the unions $[0,1]\cup[2,3]$ and $[0,1]\cup[3,4]$ are not congruent in this geometry.

Two distributions on the line have the same shape when they differ either by a translation or a reflection. When $\mathbb P$ is a probability distribution (it doesn't assign any negative values to events and it assigns the value $1$ to the entire space), the other distributions with the same shape are those that differ from it by a translation or reflection.

For instance, the Normal distribution with parameters $\mu$ and $\sigma\ne 0,$ written $\mathcal{N}(\mu,\sigma),$ arises when (1) all intervals are considered events (along with additional shapes that can be constructed from intervals) and (2) the value assigned to any event $\mathcal E$ is given by

$$\Pr(\mathcal E;\mu, \sigma) = \int_{\mathcal{E}} \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-(x-\mu)^2/(2\sigma^2))\,\mathrm{d}x.$$

Using basic rules of Calculus it is straightforward to show that for any number $\delta,$ $\mathcal{N}(\mu,\sigma)$ and $\mathcal{N}(\mu + \delta,\sigma)$ have the same shape.

In this geometry, however, if $\sigma\ne\tau,$ then $\mathcal{N}(\mu,\sigma)$ and $\mathcal{N}(\mu,\tau)$ do not have the same shape: neither is a translate nor a reflection of the other.

Affine invariance on the line

The affine group of transformations of the real line consists of all those of the form

$$x \to ax + b,\quad a \ne 0.$$

Two distributions are congruent in this geometry when one can be rescaled (possibly reversing it) and shifted into the other. In this geometry, all Normal distributions have the same shape. In effect, there is only one Normal distribution.

Special affine invariance on the line

The special affine group of transformations does not allow reversal of direction: $a$ in the preceding formula must be positive. This is the usual sense in which statisticians mean that (univariate) distributions have the same shape.

For example, all Beta$(a,b)$ distributions have distinct shapes ($a,b\gt 0$) in the special affine geometry. However, a Beta$(a,b)$ and Beta$(b,a)$ distribution have the same shape in the full affine geometry because the reflection $\mathcal{R}_{1/2}$ converts one distribution to the other.

Another nice set of examples is offered by Bernoulli$(p)$ distributions, $0\lt p \lt 1,$ written B$(p).$ These all have different shapes in special affine geometry, but B$(p)$ and B$(1-p)$ have the same shapes in affine geometry.

Higher dimensional geometries

The foregoing trivialities become more interesting when we contemplate multivariate distributions. To illustrate, the Binormal distribution with parameters $\mu_1,$ $\mu_2,$ $\sigma_1,$ $\sigma_2,$ and $\rho$ generalizes the Normal distribution. (To avoid unnecessary distractions, I exclude the possibilities $\sigma_1=0$ or $\sigma_2=0$ from the following discussion.) Let's call this BN$(\mu_1,\ldots, \rho).$ Contours of its probability density function are ellipses centered at $(\mu_1,\mu_2).$ Consider just three of the commonest geometries in the plane.

In Euclidean geometry, BN$(\mu_1,\ldots, \rho)$ and BN$(\mu_1^\prime,\ldots,\rho^\prime)$ are congruent if and only if
- $|\rho|=|\rho^\prime|$ (their contour ellipses have the same eccentricities).
- Either $\sigma_1=\sigma_1^\prime$ and $\sigma_2=\sigma_2^\prime$ or $\sigma_1=\sigma_2^\prime$ and $\sigma_2=\sigma_1^\prime.$
This is because a Binormal distribution with correlation coefficient $\rho$ can be rotated into one with correlation coefficient $-\rho$ and its variances $\sigma_i^2$ can be switched when it is rotated by a quarter circle; and they can be freely shifted to any location.
When similarity transformations are permitted, the criterion for two Binormal distributions to be congruent is relaxed. Now it suffices that $|\rho|=|\rho^\prime|$ and that either $\sigma_2/\sigma_1 = \sigma_2^\prime/\sigma_1^\prime$ or $\sigma_2/\sigma_1 = \sigma_1^\prime/\sigma_2^\prime.$ That is, size no longer matters.
The most general group of invertible linear transformations of the plane is the general linear group, $GL(\mathbb R, 2).$ This includes skew transformations along with reflections and isotheties. This is the group I exploited heavily in an analysis of regression and correlation at What is the intuition behind conditional Gaussian distributions?. As shown there, two Binormal distributions are congruent (with respect to the general linear group) if and only if they have a common center. Moreover, when we include translations in the geometry (thereby producing the general affine group), any two Binormal distributions are affinely congruent.

In the sense of $(3),$ there is only one Binormal distribution!

Conclusions

What it means for two distributions to "have the same shape" depends on a geometric structure you elect to use for the space on which those distributions are defined.

When that space is the set of real numbers and the geometry's group consists of all translations and positive rescalings, we recover the standard statistical sense in which two univariate distributions have the same shape.

When the space has more than one dimension, we're discussing multivariate distributions. Because there is a rich set of possible geometries to consider, the meaning of "have the same shape" will depend on which geometry is applicable to any particular problem or context. In some settings, for instance, all Binormal distributions have the same shape.

Accordingly, to paraphrase the geometric definition of shape,

The shape of a distribution is the set of its properties that are not changed by any geometrical transformation.

To resolve the present question, then, it should now be clear that the answer depends on which geometry you chose to use. When your geometry is that of the general group of affine transformations of the line, then the answer is yes: two distributions have the same shape if and only if they are related by an affine transformation. Otherwise, the answer is no.

score 1 · Answer 2 · answered Feb 14 '22 at 17:54

Showing this requires a definition of "shape". All that comes to mind to me is what the distribution looks like if it is centered and scaled. This is straightforward when the mean exists and variance is finite, so I will proceed with such an assumption.

Let $X$ be a random variable with distribution function $F_X(x)$. Define a new random variable, $Z$, as a function of $X$.

$$ Z(X) = \dfrac{X-\mathbb E \big[X\big]} {\sqrt{var(X)}} $$

Now let $T(X) = a + bX$ be a linear transformation with $b \ne 0$.

$$ \mathbb E\big[ T(X)\big] = a + b\mathbb E\big[ X \big]\\ var(T(X)) = b^2var(X) $$

Let's apply that same logic of centering and scaling $T(X)$.

$$ Z(T(X)) = \dfrac{ T(X) - \mathbb E\big[ T(X)\big] }{ \sqrt{var(T(X))} } = \dfrac{ a + bX -(a + b\mathbb E\big[ X \big]) }{ \sqrt{b^2var(X)} } =\dfrac{ bX-b\mathbb E\big[ X \big] }{ b\sqrt{var(X)} } \\= \dfrac{X-\mathbb E \big[X\big]} {\sqrt{var(X)}} = Z(X) $$

Why restrict yourself to assuming a mean and variance when you could use median and some form of range (such as IQR) with almost perfect generality? — whuber, Feb 14 '22 at 18:09