2

Skewness is generally defined as a standardized third-order centered moment.

$$S(X) \triangleq \frac{\mathbb{E}\left[ \left( X - \mathbb{E} \left[ X \right] \right)^3 \right]}{\left( \mathbb{E}\left[ \left( X - \mathbb{E} \left[ X \right] \right)^2 \right] \right)^{\frac{3}{2}}}$$

Sometimes skewness is the property that we're interested in, but often it is used to quantify the reflective asymmetry of a probability distribution. This is a heuristic because Meijer 2000 demonstrated that asymmetric distributions can have a skewness of zero even though a non-zero skewness implies that a distribution is asymmetric.

Drawing from skewness as an analogy, I would like to consider kurtosis under a similar light. Kurtosis is likewise a high-order moment:

$$K(X) \triangleq \frac{\mathbb{E}\left[ \left( X - \mathbb{E} \left[ X \right] \right)^4 \right]}{\left( \mathbb{E}\left[ \left( X - \mathbb{E} \left[ X \right] \right)^2 \right] \right)^{2}}$$

Analogously, as skewness tells us about reflective symmetry, kurtosis tells us about how 'tall-and-skinny' the distribution is. As whuber discusses on another post, the symmetry property can be formulated as an equality in terms of either its CDF or PDF (assuming sufficient smoothness).

I have yet to see a formulation of the 'tall-and-skinny' property of a distribution written in an analogous way in terms of either the CDF or PDF of a distribution.

Off the top of my head I would speculate that kurtosis of a CDF could involve a comparison to either extreme of a step function being a maximally-kurtic peak, and a line being a minimally-kurtic peak.

In terms of the PDF I might suspect that an aspect-ratio could be suitable, except that some choice on what interval counts as the 'base' would need to be defined. The choice of height might be taken to be the density at the maximum likelihood value (MLV), but this would be problematic for distributions with multiple MLV's.

How can a 'tall-and-skinny' property related to kurtosis be stated rigorously in terms of its (smooth) probability functions?

DifferentialPleiometry
  • 2,274
  • 1
  • 11
  • 27
  • 1
    The meaning and interpretation of kurtosis has become somewhat of a controversy here on CV. See [this site search](https://stats.stackexchange.com/search?q=kurtosis+long+tail*). The problem is that kurtosis isn't actually about the height of a density function or its tail behavior, because the fourth moment doesn't directly relate to either. – whuber Jul 06 '21 at 22:05
  • 1
    @whuber Oh, too bad. Thanks for updating me. – DifferentialPleiometry Jul 06 '21 at 22:12
  • 3
    [Westfall 2014](https://www.tandfonline.com/doi/abs/10.1080/00031305.2014.917055) does a sort of 'debunking paper' on the topic. – DifferentialPleiometry Jul 06 '21 at 22:36
  • 1
    For various reasons it is difficult to get an intuitive idea of kurtosis. I think that looking at examples might be the best way to start. [This Wikipedia page](https://en.wikipedia.org/wiki/Kurtosis) might be helpful. – BruceET Jul 06 '21 at 22:36
  • 2
    Thanks @BruceET , I am ahead of you on that suggestion but it is appreciated. – DifferentialPleiometry Jul 06 '21 at 22:37

1 Answers1

2

Well, you cannot relate tall and skinny to kurtosis because there is no mathematical connection. The beta(.5,1) distribution is infinitely tall and skinny but has low kurtosis. And the .9999U(0,1) + .0001Cauchy mixture appears perfectly flat over 99.99% of the observable data, but has infinite kurtosis.

Contrary to whuber's comment, kurtosis is in fact precisely related to the tails. Larger kurtosis mathematically implies greater tail leverage. See here and here for precise descriptions of "tail leverage."

BigBendRegion
  • 4,593
  • 12
  • 22
  • Interesting. Does this lack of relation include a lack of [stochastic dependence](https://en.wikipedia.org/wiki/Independence_(probability_theory))? – DifferentialPleiometry Jul 07 '21 at 03:16
  • I recall from earlier today that [Westfall 2014](https://www.tandfonline.com/doi/abs/10.1080/00031305.2014.917055) provides a mathematical argument for tails being more influential than the central values on the value of kurtosis of a distribution. – DifferentialPleiometry Jul 07 '21 at 03:17
  • 1
    I like the counterexamples you give that illustrate the kurtosis cannot be a direct or monotonic function of what I intuitively recognise as skinny-and-tallness. – DifferentialPleiometry Jul 07 '21 at 03:24
  • Right. But there is a monotonic relation between kurtosis and tail leverage, as my linked post shows. – BigBendRegion Jul 07 '21 at 08:56
  • I don't see any connection between kurtosis and independence. Maybe if more context were given I could see it. For example, the marginal distribution of a ARCH process would exhibit high kurtosis even when all the conditional distributions are normal. – BigBendRegion Jul 07 '21 at 08:59
  • Okay. Let $T(X)$ be an estimator of the tall-and-skinniness of the distribution of the real-valued random variable X, and $K(X)$ the kurtosis. Does $T(X) \perp \!\!\! \perp K(X)$ (or its negation) hold for all $X$? – DifferentialPleiometry Jul 07 '21 at 13:56
  • 1
    There would have to be many more details, but generally probably not. For example, the sample mean and sample variance are correlated under non-normality. – BigBendRegion Jul 07 '21 at 14:23
  • The sense in which I made my comment is that kurtosis tells us little about the tails--but you appear to assert more with your use of "precisely." Here's a test: I have in mind a distribution with zero mean and skewness, unit variance, and excess kurtosis of 1. What can you tell us, either qualitatively, or quantitatively, about either of the tails? How would your explanation change if the kurtosis were 10 instead of 1? Can you explain what you mean by "tail leverage" (an uncommon term)? – whuber Jul 07 '21 at 15:37
  • I answered in the link in my post, with more detail given here https://stats.stackexchange.com/a/481022/102879 – BigBendRegion Jul 07 '21 at 16:23
  • @BigBend Thank you. Unfortunately, your definition doesn't work, because the meaning of "$m$" is indefinite -- it isn't determined by your existence assumption and there aren't any relevant quantifiers in the attempted definition. It seems to be saying one distribution has greater "tail leverage" than another if it has a greater standardized central moment of some indefinite order $m.$ In the present case, isn't asserting a "monotonic relation" with kurtosis therefore a tautology? – whuber Jul 09 '21 at 14:21
  • Of course the definition works. Just pick an $m$. With $m=4$, you get kurtosis. Just like with other definitions of tail-heaviness, there are infinitely many. This one has the advantage that it actually applies to data. And there is nothing wrong with a tautology when it provides great insights. The issue is not the algebraic tautology, it is the graphical leverage representation, which is precise, provides answers to questions such as "What does larger kurtosis mean, precisely?", and is a new way of looking at the problem that finally settles the issue. – BigBendRegion Jul 09 '21 at 15:25