77

I KNOW what moments are and how to calculate them and how to use the moment generating function for getting higher order moments. Yes, I know the math.

Now that I need to get my statistics knowledge lubricated for work, I thought I might as well ask this question – it's been nagging me for about a few years and back in college no professor knew the answer or would just dismiss the question (honestly).

So what does the word "moment" mean in this case? Why this choice of word? It doesn't sound intuitive to me (or I never heard it that way back in college :) Come to think of it I am equally curious with its usage in "moment of inertia" ;) but let's not focus on that for now.

So what does a "moment" of a distribution mean and what does it seek to do and why THAT word! :) Why does any one care about moments? At this moment I am feeling otherwise about that moment ;)

PS: Yes, I've probably asked a similar question on variance but I do value intuitive understanding over 'look in the book to find out' :)

Alexis
  • 26,219
  • 5
  • 78
  • 131
PhD
  • 13,429
  • 19
  • 45
  • 47
  • 6
    For the word choice, start with its [etymology](http://www.etymonline.com/index.php?term=moment). – whuber Oct 26 '11 at 22:02
  • 2
    @whuber: yeah! Looked it up before posing this question - many years ago too ;) – PhD Oct 27 '11 at 00:52
  • I would combine the etymology provided by @whuber with this ( http://www.thefreedictionary.com/moment ) look at the Math/Stat definition that cited from Collins English Dictionary. Combined that with common use definitions such as "short period of time" or "specific instance." I'm fairly certain that moment in our math/stat sense is interchangeable with points. Just these points have particular significance in certain applications (MGF or MOI) before Descartes geometry and algebra had no systematic link so they probably had a variety of different terms for what are actually the same thing. – Chris Simokat Oct 27 '11 at 02:01
  • 4
    It's from Macbeth: "_Who can be wise, amazed, temperate and furious, Loyal and neutral, in a moment?_" Macbeth: Act ii. Sc. 3 – wolfies Feb 05 '14 at 08:40

4 Answers4

68

According to the paper "First (?) Occurrence of Common Terms in Mathematical Statistics" by H.A. David, the first use of the word 'moment' in this situation was in a 1893 letter to Nature by Karl Pearson entitled "Asymmetrical Frequency Curves".

Neyman's 1938 Biometrika paper "A Historical Note on Karl Pearson's Deduction of the Moments of the Binomial" gives a good synopsis of the letter and Pearson's subsequent work on moments of the binomial distribution and the method of moments. It's a really good read. Hopefully you have access JSTOR for I don't have the time now to give a good summary of the paper (though I will this weekend). Though I will mention one piece that may give insight as to why the term 'moment' was used. From Neyman's paper:

It [Pearson's memoir] deals primarily with methods of approximating continuous frequency curves by means of some processes involving the calculation of easy formulae. One of these formulae considered was the "point-binomial" or the "binomial with loaded ordinates". The formula
differs from what to-day we call a binomial, viz. (4), only by a factor $\alpha$, representing the area under the continuous curve which it is desired to fit.

This is what eventually led to the 'method of moments.' Neyman goes over the Pearson's derivation of the binomial moments in the above paper.

And from Pearson's letter:

We shall now proceed to find the first four moments of the system of rectangles round GN. If the inertia of each rectangle might be considered as concentrated along its mid vertical, we should have for the $s^{\text{th}}$ moment round NG, writing $d = c(1 + nq)$.

This hints at the fact that Pearson used the term 'moment' as an allusion to 'moment of inertia,' a term common in physics.

Here's a scan of most of Pearson's Nature letter:

enter image description here

enter image description here

You can view the entire article on page 615 here.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
  • 1
    Can I give a +100 to this answer? ;) – PhD Oct 27 '11 at 06:11
  • The link to 'first occurrence of term in math...' seems to point to UPenn's proxy... – PhD Oct 27 '11 at 06:15
  • 5
    @Nupul, you can give +100 as a bounty. The bounties can be awarded when question is two days old. – mpiktas Oct 27 '11 at 06:58
  • Haha! I know, but that's not what I meant ;) – PhD Oct 27 '11 at 07:41
  • @Mike: That's a great answer but I'm still not clear as to the intent or why the choice of name. Could you please throw more light on that? I'm struggling to understand it intuitively... – PhD Oct 27 '11 at 07:48
  • @Mike: I seem to be getting a hang of it - so the moment for each of the rectangles was computed w.r.t., OY - but what is the formula used for that? What is that formula for finding the 'rth' moment? Is that an MGF? If so, I'd want to know where it came from or is the moment in this case borrowed from physics? I couldn't find (or remember) a derivative based formula for that...I'm missing something for sure. Could you please help me out here? – PhD Oct 27 '11 at 08:30
  • @Nupul Sorry about the UPenn link...I'll fix it. And if someone doesn't beat me to it, I'll definitely add some more detail this weekend. –  Oct 27 '11 at 13:10
  • 5
    @Nupul Observe Pearson's multiple references to "gravity." Clearly he is reasoning with a physical analogy. This pushes the question back to why *physics* uses the term "moment" for such things. I believe it simply is a natural generalization of the idea of *moment of inertia* (a second moment), which you find referenced in the etymology links for "moment." *That* is why the etymology is relevant. – whuber Oct 27 '11 at 16:50
  • @whuber: Oh yes, I got that. My confusion was now of the math...have a look at the formula for generating the 'rth' moment. I'm not aware of such a formula in Physics and that was my doubt - is that an MGF? If so, I'd like to know where it came from since just calculating simple moments for each of the rectangles doesn't seem to have a 'deriviates' based formula...unless I'm missing something – PhD Oct 27 '11 at 20:42
  • 6
    Physics recognizes higher moments than the second, Nupul, and the formulas are identical to those of statistics. One merely translates "density" of an object into "probability density." In fact, physics has generalized the idea into that of a moment being a [coefficient of a power series expansion](http://en.wikipedia.org/wiki/Multipole_moment) in some appropriate coordinate system. – whuber Oct 27 '11 at 20:49
  • 3
    @Nupul I don't know if I can add anything more than what whuber has stated. I'm thinking that anything beyond what I've linked in my response and whuber's comments can probably be addressed more thoroughly in [Physics SE](http://physics.stackexchange.com/). And if it's still not 'deep' enough, there's always the [English SE](http://english.stackexchange.com/) whose 5th most used tag is 'etymology.' But, great question! Enjoyed researching it and found 3 great papers I never knew existed. –  Oct 29 '11 at 18:07
  • Here's a quick copy of the entire letter: https://i.imgur.com/T7eY2Pp.png – user2426679 Mar 22 '19 at 19:53
9

Everybody has its moment on moments. I had mine in Cumulant and moment names beyond variance, skewness and kurtosis, and spent some time reading this gorgious thread.

Oddly, I did not find the "moment mention" in " H. A. David's paper. So I went to Karl Pearson: The Scientific Life in a Statistical Age, a book by T. M. Porter. and Karl Pearson and the Origins of Modern Statistics: An Elastician becomes a Statistician. He for instance edited A History of the Theory of Elasticity and of the Strength of Materials from Galilei to the Present Time.

His background was very wide, and he was notably a professor of engineering and elastician, who was involved in determining the bending moments of a bridge span and calculating stresses on masonry dams. In elasticity, one only observe what is is going on (rupture) in a limited manner. He seemingly was interested in (from Porter's book):

graphical calculation or, in its most dignified and mathematical form, graphical statics.

Later :

From the beginning of his statistical career, and even before that, he fit curves using the "method of moments." In mechanics, this meant matching a complicated body to a simple or abstract one that had the same center of mass and "swing radius," respectively the first and second moments. These quantities corresponded in statistics to the mean and the spread or dispersion of measurements around the mean.

And since:

Pearson dealt in discrete measurement intervals, this was a sum rather than an integral

Inertial moments can stand for a summary of a moving body: computations can be carried out as if the body was reduced to a single point.

Pearson set up these five equalities as a system of equations, which combined into one of the ninth degree. A numerical solution was only possible by successive approximations. There could have been as many as nine real solutions, though in the present instance there were only two. He graphed both results alongside the original, and was generally pleased with the appearance of the result. He did not, however, rely on visual inspection to decide between them, but calculated the sixth moment to decide the best match

Let us go back to physics. A moment is a physical quantity that takes into account the local arrangement of a physical property, generally with respect to a certain ordinal point or axis (classically in space or time). It summarizes physical quantities as measured at some distance from a reference. If the quantity is not concentrated at a single point, the moment is "averaged" over the whole space, by means of integrals or sums.

Apparently, the concept of moments can be traced back to the discovery of the operating principle of the lever "discovered" by Archimedes. One of the first known occurrence is the Latin word "momentorum" with the present accepted sense (moment about a center of rotation). In 1565, Federico Commandino translated Archimedes' work (Liber de Centro Gravitatis Solidorum) as:

The center of gravity of each solid figure is that point within it, about which on all sides parts of equal moment stand.

or

Centrum gravitatis uniuscuiusque solidae figurae est punctum illud intra positum, circa quod undique partes aequalium momentorum

So apparently, the analogy with physics is quite strong: from a complicated discrete physical shape, find quantities that approximate it sufficiently, a form of compression or parsimony.

Laurent Duval
  • 2,077
  • 1
  • 20
  • 33
9

Question: So what does the word "moment" mean in this case? Why this choice of word? It doesn't sound intuitive to me (or I never heard it that way back in college :) Come to think of it I am equally curious with its usage in "moment of inertia" ;) but let's not focus on that for now.

Answer: Actually, in a historical sense, moment of inertia is probably where the sense of the word moments comes from. Indeed, one can (as below) show how the moment of inertia relates to variance. This also yields a physical interpretation of higher moments.

In physics, a moment is an expression involving the product of a distance and a physical quantity, and in this way it accounts for how the physical quantity is located or arranged. Moments are usually defined with respect to a fixed reference point; they deal with physical quantities as measured at some distance from that reference point. For example, the moment of force acting on an object, often called torque, is the product of the force and the distance from a reference point, as in the example below.

enter image description here

Less confusing than the names usually given, e.g., hyperflatness etc. for higher moments would be moments from circular motion e.g., moments of inertia for circular motion, of rigid bodies which is an simple conversion. Angular acceleration is the derivative of angular velocity, which is the derivative of angle with respect to time, i.e., $ \dfrac{d\omega}{dt}=\alpha,\,\dfrac{d\theta}{dt}=\omega$. Consider that the second moment is analogous to torque applied to a circular motion, or if you will an acceleration/deceleration (also second derivative) of that circular (i.e., angular, $\theta$) motion. Similarly, the third moment would be a rate of change of torque, and so on and so forth for yet higher moments to make rates of change of rates of change of rates of change, i.e., sequential derivatives of circular motion. This is perhaps easier to visualize this with actual examples.

There are limits to physical plausibility, e.g., where an object begins and ends, i.e., its support, which renders the comparison more or less realistic. Let us take the example of a beta distribution, which has (finite) support on [0,1] and show the correspondence for that. The beta distribution density function (pdf) is $$\beta(x;\alpha,\beta)=\begin{array}{cc} \Bigg\{ & \begin{array}{cc} \dfrac{x^{\alpha -1} (1-x)^{\beta -1}}{B(\alpha ,\beta )} & 0<x<1 \\ 0 & \text{True} \\ \end{array} \\ \end{array}\,,$$ where $B(\alpha,\beta)=\dfrac{\Gamma(\alpha)\,\Gamma(\beta)}{\Gamma(\alpha+\beta)}$, and $\Gamma(.)$ is the gamma function, $\Gamma(z) = \int_0^\infty x^{z-1} e^{-x}\,dx$.

The mean is then the first moment of rotation around the $z$-axis for the beta function plotted as a rigidly rotating thin sheet of uniform area density with the minimum $x$-value affixed to the (0,0,0) origin, with its base in the $x,y$ plane. $$\mu=\int_0^1r\,\beta(r;\alpha,\beta)\,dr=\frac{\alpha}{\alpha+\beta}\,,$$ as illustrated for $\beta(r;2,2)$, i.e., $\mu=\dfrac{1}{2}$, below enter image description here

Note that there is nothing preventing us from moving the beta distribution thin sheet to another location and re-scaling it, e.g., from $0\leq r\leq1$ to $2\leq r\leq4$, or changing the vertical shape, for example to be a paddle rather than a hump.

To calculate the beta distribution variance, we would calculate the moment of inertia for a shifted beta distribution with the $r$-value mean placed on the $z$-axis of rotation, $$\sigma^2=\int_0^1 (r-\mu)^2 \beta(r;\alpha,\beta) \, dr =\frac{\alpha \beta }{(\alpha +\beta )^2 (\alpha +\beta +1)}\,,$$ which for $\beta(r;2,2)$, i.e., $I=\sigma^2=\dfrac{1}{20}$, where $I$ is the moment of inertia, looks like this,

enter image description here

Now for higher so called 'central' moments, i.e., moments about the mean, like skewness, and kurtosis we calculate the $n^{\text{th}}$ moment around the mean from $$\int_0^1 (r-\mu)^n \beta(r;\alpha,\beta) \, dr\,.$$ This can also be understood to be the $n^{\text{th}}$ derivative of circular motion.

What if we want to calculate backwards, that is, take a 3D solid object and turn it into a probability function? Things then get a bit trickier. For example, let us take a torus. enter image description here

First we take its circular cross section, then we make it into a half ellipse to show the density of any flat coin like slice, then we convert the coin into a wedge-shaped coin to account for the increasing density with increasing distance ($r$) from the $z$-axis, and finally we normalize for the area to make a density function. This is outlined graphically below with the mathematics left to the reader.

enter image description here

Finally, we ask how these equivalences relate to motion? Note that as above the moment of inertia, $I$, can be made related to the second central moment, $\sigma^2$, A.K.A., the variance. Then $I=\dfrac{\tau}{a}$, that is, the ratio of the torque, $\tau$, and the angular acceleration, $a$. We would then differentiate to obtain higher order rates of change in time.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • The connection between moments and derivatives is obscure. (It definitely exists, but the relationship usually is revealed through the Fourier Transform.) Could you show explicitly how and why moments can be interpreted as derivatives? How does this work? – whuber Jan 21 '18 at 16:24
  • @whuber Later, meanwhile look at moments link above, it shows ||. – Carl Jan 21 '18 at 16:46
  • Thank you. I see that page and I get a glimmer of what you're referring to, but the connection with moments of a distribution is not clear. I'm intrigued and look forward to your further elaboration of this idea. – whuber Jan 21 '18 at 17:35
  • @whuber Check it over and see if you agree. – Carl Feb 12 '18 at 08:10
  • Thank you (+1). This is not what I thought you meant--but it's valid and very well illustrated. I appreciate your effort. But didn't you refer to some connection to derivatives? I don't see that here. – whuber Feb 12 '18 at 14:40
  • @whuber Angular acceleration is the derivative of angular velocity, which is the derivative of angle with respect to time, i.e., $\dfrac{d\omega}{dt}=\alpha,\,\dfrac{d\theta}{dt}=\omega.$ – Carl Feb 12 '18 at 14:48
  • I see where you're going with this and it clarifies my initial impression of the connection to the Fourier Transform: thank you. – whuber Feb 12 '18 at 14:52
  • @whuber Perhaps the relationship is via the *mgf* from Taylor series? – Carl Feb 12 '18 at 14:55
  • 3
    Yes, that can be done. When the argument $x$ of the series is written as $x=e^{iq}$ then you have a Fourier series. Moreover, the connection between moments and derivatives is explicit in the Fourier transform: the differentiation operator is transformed into multiplication by $q$, directly showing how moments are connected to derivatives of the same order. – whuber Feb 12 '18 at 15:06
  • Nice answer. Did you create these visualizations? If not, where did you get them? Also, you've clearly copied text from Wikipedia in your first paragraph (or edited Wikipedia). For intellectual honesty, I would rephrase or make that section a quotation. – jds Jan 16 '20 at 13:01
  • @gwg I made the graphics, took a while to do so. The link to Wikipedia is followed by context from that link; credit was attributed such that quotes would be clumsy and would not really increase "honesty" although it might increase a false sense of increased defensibility. This would be a false attribution in that the same or similar phraseology appears in texts that predate the Wikipedia entry, and Wikipedia itself does not cite those sources. For example, see similar wording [here](http://quiznext.in/study-material/learning_material/ICSE-10-Physics/Force_3/moment-of-force-and-equilibrium/). – Carl Jan 16 '20 at 14:25
  • Your reasons might be justified, but to a casual read it _looks_ like plagiarism. Do what you want; just FYI. – jds Jan 16 '20 at 15:47
  • @gwg From [history of](https://en.wikipedia.org/wiki/Moment_(physics)#History) "The principle of moments is derived from Archimedes' discovery of the operating principle of the lever. In the lever one applies a force, in his day most often human muscle, to an arm, a beam of some sort. Archimedes noted that the amount of force applied to the object, the moment of force, is defined as M = rF, where F is the applied force, and r is the distance from the applied force to object. However, historical evolution of the term 'moment' and its use in different branches of... – Carl Jan 16 '20 at 16:56
  • @gwg ...science, such as mathematics, physics and engineering, is unclear." This cannot be attributed to a source, the words sequence alone is not something that merits the title plagiarism without first determining who said what when, and it is not worth delving into since it is the understanding, buried in historical time and predating the English language that has any sense of proper attribution. And even Wikipedia's history is too vague to even properly hint at that origin. – Carl Jan 16 '20 at 17:04
5

Being overly simplistic, statistical moments are additional descriptors of a curve/distribution. We are familiar with the first two moments and these are generally useful for continuous normal distributions or similar curves. However these first two moments lose their informational value for other distributions. Thus other moments provide additional information on the shape/form of the distribution.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    I do not think that the meaning of the first two moments lose meaning for all non-normal distributions, for example, mean residence time is generally the first moment or integral average of times in a time series. – Carl Nov 17 '16 at 19:12