51

(This is based on a question that just came to me via email; I've added some context from a previous brief conversation with the same person.)

Last year I was told that the gamma distribution is heavier tailed than the lognormal, and I've since been told that's not the case.

  • Which is heavier tailed?

  • What are some resources I can use to explore the relationship?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • 4
    To the person that just downvoted: It would be useful to know what the perceived problem with the question is. – Glen_b Jan 01 '18 at 00:45
  • 1
    Was not me, I upvoted a long time ago. However, I suspect it was about the utility of heavy-tailedness versus kurtosis in the context of t-testing assumptions in the presence of outliers, which has absolutely nothing to do with what you asked. Downvoting is, IMHO, [problematic](https://stats.meta.stackexchange.com/questions/4567/how-can-we-discourage-ganging-up-with-downvotes). – Carl Jan 01 '18 at 01:11

3 Answers3

54

The (right) tail of a distribution describes its behavior at large values. The correct object to study is not its density--which in many practical cases does not exist--but rather its distribution function $F$. More specifically, because $F$ must rise asymptotically to $1$ for large arguments $x$ (by the Law of Total Probability), we are interested in how rapidly it approaches that asymptote: we need to investigate the behavior of its survival function $1- F(x)$ as $x \to \infty$.

Specifically, one distribution $F$ for a random variable $X$ is "heavier" than another one $G$ provided that eventually $F$ has more probability at large values than $G$. This can be formalized: there must exist a finite number $x_0$ such that for all $x \gt x_0$, $${\Pr}_F(X\gt x) = 1 - F(x) \gt 1 - G(x) = {\Pr}_G(X\gt x).$$

Figure

The red curve in this figure is the survival function for a Poisson$(3)$ distribution. The blue curve is for a Gamma$(3)$ distribution, which has the same variance. Eventually the blue curve always exceeds the red curve, showing that this Gamma distribution has a heavier tail than this Poisson distribution. These distributions cannot readily be compared using densities, because the Poisson distribution has no density.

It is true that when the densities $f$ and $g$ exist and $f(x) \gt g(x)$ for $x \gt x_0$ then $F$ is heavier-tailed than $G$. However, the converse is false--and this is a compelling reason to base the definition of tail heaviness on survival functions rather than densities, even if often the analysis of tails may be more easily carried out using the densities.

Counter-examples can be constructed by taking a discrete distribution $H$ of positive unbounded support that nevertheless is no heavier-tailed than $G$ (discretizing $G$ will do the trick). Turn this into a continuous distribution by replacing the probability mass of $H$ at each of its support points $k$, written $h(k)$, by (say) a scaled Beta$(2,2)$ distribution with support on a suitable interval $[k-\varepsilon(k), k+\varepsilon(k)]$ and weighted by $h(k)$. Given a small positive number $\delta,$ choose $\varepsilon(k)$ sufficiently small to ensure that the peak density of this scaled Beta distribution exceeds $f(k)/\delta$. By construction, the mixture $\delta H + (1-\delta )G$ is a continuous distribution $G^\prime$ whose tail looks like that of $G$ (it is uniformly a tiny bit lower by an amount $\delta$) but has spikes in its density at the support of $H$ and all those spikes have points where they exceed the density of $f$. Thus $G^\prime$ is lighter-tailed than $F$ but no matter how far out in the tail we go there will be points where its density exceeds that of $F$.

Figure

The red curve is the PDF of a Gamma distribution $G$, the gold curve is the PDF of a lognormal distribution $F$, and the blue curve (with spikes) is the PDF of a mixture $G^\prime$ constructed as in the counterexample. (Notice the logarithmic density axis.) The survival function of $G^\prime$ is close to that of a Gamma distribution (with rapidly decaying wiggles): it will eventually grow less than that of $F$, even though its PDF will always spike above that of $F$ no matter how far out into the tails we look.


Discussion

Incidentally, we can perform this analysis directly on the survival functions of lognormal and Gamma distributions, expanding them around $x=\infty$ to find their asymptotic behavior, and conclude that all lognormals have heavier tails than all Gammas. But, because these distributions have "nice" densities, the analysis is more easily carried out by showing that for sufficiently large $x$, a lognormal density exceeds a Gamma density. Let us not, however, confuse this analytical convenience with the meaning of a heavy tail.

Similarly, although higher moments and their variants (such as skewness and kurtosis) say a little about the tails, they do not provide sufficient information. As a simple example, we may truncate any lognormal distribution at such a large value that any given number of its moments will scarcely change--but in so doing we will have removed its tail entirely, making it lighter-tailed than any distribution with unbounded support (such as a Gamma).

A fair objection to these mathematical contortions would be to point out that behavior so far out in the tail has no practical application, because nobody would ever believe that any distributional model will be valid at such extreme (perhaps physically unattainable) values. That shows, however, that in applications we ought to take some care to identify which portion of the tail is of concern and analyze it accordingly. (Flood recurrence times, for instance, can be understood in this sense: 10-year floods, 100-year floods, and 1000-year floods characterize particular sections of the tail of the flood distribution.) The same principles apply, though: the fundamental object of analysis here is the distribution function and not its density.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 6
    +1 excellent discussion of why it should be based on the survivor function. I've recommended to the original source of the question that they should have a look at your response. – Glen_b Feb 13 '14 at 23:24
  • 1
    (+1) for good probabilistic discussion of how to interpret survival function. –  Apr 12 '16 at 14:07
  • This definition of heavy tails is fine, as *one* definition. But it has serious problems. In particular, there are bounded distributions that arguably have heavy tails, such as a .9999*U(-1,1) + .0001*U(-1000,1000) distribution. By the "definition" given, the N(0,1) distribution has heavier tails than the .9999*U(-1,1) + .0001*U(-1000,1000) distribution. This is obviously silly. Let's face it: There are infinitely many ways to measure tailedness of distribution. – BigBendRegion Nov 20 '17 at 00:34
  • 1
    @Peter The "silliness" arises because you seem to have gotten the ideas backwards. Neither of your examples has a "heavy" tail in any sense, because they are bounded. Both survival functions eventually are exactly zero and therefore both tails are equally light. – whuber Nov 20 '17 at 16:57
  • Backwards? Seriously? whuber, do you really want to argue that N(0,1) is "heavier-tailed" than .9999*U(-1,1) + .0001*U(-1000,1000) ? Here is a little math: The .9999*U(-1,1) + .0001*U(-1000,1000) density has mean 0, and variance 33.67. So, when the data come from U(-1000,1000), a "typical" z score is (500 -0)/sqrt(33.67) = 86.2. Many are much larger. Now, whuber, if you want to argue that the normal distribution has a heavier tail, then you have to argue that observations 86.2 standard deviations from the mean are plausibly observable when data come from a normal distribution. Can you? – BigBendRegion Nov 22 '17 at 01:03
  • 1
    @PeterWestfall You have compared tails having bounded support with those having infinite support, as if that were meaningful. Many contexts exist in which that would be unnecessary, silly even. In those contexts in which one would compare them a quantile difference ratio may be appropriate. There are not many contexts beyond those and if you can think of one, do tell. – Carl Nov 22 '17 at 01:53
  • As far as contexts where the comparison meaningful, how about outliers? Outliers are generally important for all kinds of statistics. Tests for means for one. Estimation of variance for another. Robustness in general for another. And we are concerned about tails precisely because we are concerned about outliers. So your comment that is "silly" to compare distributions with bounded support with those with infinite support is itself quite silly. Of course we would like to know which one is more outlier-prone, eg, if we choose to use one or the other to model an outlier-prone process. – BigBendRegion Nov 24 '17 at 23:24
  • @Peter This is not a thread about outliers--it's about distributions. Nobody is challenging your remarks about outliers (at least I'm certainly not--I'm quite sympathetic to them), but in this setting they are at best distractions and at worst confusing. Your reasoning about outliers simply is not germane to the question at hand. – whuber Nov 25 '17 at 00:17
  • Okay, you don't like the term "outlier". Simply replace it with "rare, extreme potentially observable data," and we are talking the same language. After all, distributions tell you about potentially observable data. – BigBendRegion Nov 25 '17 at 00:23
  • @Peter I have no problem with "outlier," but I understand it to be a property of individual observations (or small groups thereof) in the context of a set of data. Regardless, your example of the uniform mixture shows that you are not working with the definitions of "heavy tailed" and "long tailed" that are the focus of the present inquiry. (These definitions arise in the study of rare phenomena, extreme events, etc., where calling things "outliers" tends to be less than constructive, anyway: interest is focused on the tails in their own right, not as exceptional excursions from the center.) – whuber Nov 25 '17 at 00:38
  • @Peter In reviewing your comments, I get the impression you leapt into this thread without reading my post. May I refer you to the last three paragraphs under "discussion"? I believe they address most, if not all, of your points and concerns. – whuber Nov 25 '17 at 00:41
  • Agreed, which portion of the tail is relevant is a concern. That is the point I was making in saying that the N(0,1) is lighter tailed than .9999*U(-1,1) +.0001*U(-1000,1000), despite the latter having finite support. 100 year floods or whatever else you want to study do not have probabilities as low as represented by 100+ standard deviations from the mean of a normal distribution. So, good, we are in agreement. The N(0,1) is lighter tailed than .9999*U(-1,1) +.0001*U(-1000,1000), as far as portions of the tail that are relevant. – BigBendRegion Nov 25 '17 at 00:54
33

The gamma and the lognormal are both right skew, constant-coefficient-of-variation distributions on $(0,\infty)$, and they're often the basis of "competing" models for particular kinds of phenomena.

There are various ways to define the heaviness of a tail, but in this case I think all the usual ones show that the lognormal is heavier. (What the first person might have been talking about is what goes on not in the far tail, but a little to the right of the mode (say, around the 75th percentile on the first plot below, which for the lognormal is just below 5 and the gamma just above 5.)

However, let's just explore the question in a very simple way to begin.

Below are gamma and lognormal densities with mean 4 and variance 4 (top plot - gamma is dark green, lognormal is blue), and then the log of the density (bottom), so you can compare the trends in the tails:

enter image description here

It's hard to see much detail in the top plot, because all the action is to the right of 10. But it's quite clear in the second plot, where the gamma is heading down much more rapidly than the lognormal.

Another way to explore the relationship is to look at the density of the logs, as in the answer here; we see that the density of the logs for the lognormal is symmetric (it's normal!), and that for the gamma is left-skew, with a light tail on the right.

We can do it algebraically, where we can look at the ratio of densities as $x\rightarrow\infty$ (or the log of the ratio). Let $g$ be a gamma density and $f$ lognormal:

$$\log(g(x)/f(x)) = \log(g(x)) - \log(f(x))$$

$$=\log\left(\frac{1}{\Gamma(\alpha)\beta^\alpha}x^{\alpha-1}e^{-x/\beta}\right)-\log\left(\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log(x)-\mu)^2}{2\sigma^2}}\right)$$

$$=-k_1-(\alpha-1)\log(x)-x/\beta - (-k_2-\log(x)-\frac{(\log(x)-\mu)^2}{2\sigma^2})$$

$$=\left[c-(\alpha-2)\log(x)+\frac{(\log(x)-\mu)^2}{2\sigma^2}\right]-x/\beta $$

The term in the [ ] is a quadratic in $\log(x)$, while the remaining term is decreasing linearly in $x$. No matter what, that $-x/\beta$ will eventually go down faster than the quadratic increases irrespective of what the parameter values are. In the limit as $x\rightarrow\infty$, the log of the ratio of densities is decreasing toward $-\infty$, which means the gamma pdf is eventually much smaller than the lognormal pdf, and it keeps decreasing, relatively. If you take the ratio the other way (with lognormal on top), it eventually must increase beyond any bound.

That is, any given lognormal is eventually heavier tailed than any gamma.


Other definitions of heaviness:

Some people are interested in skewness or kurtosis to measure the heaviness of the right tail. At a given coefficient of variation, the lognormal is both more skew and has higher kurtosis than the gamma.**

For example, with skewness, the gamma has a skewness of 2CV while the lognormal is 3CV + CV$^3$.

There are some technical definitions of various measures of how heavy the tails are here. You might like to try some of those with these two distributions. The lognormal is an interesting special case in the first definition - all its moments exist, but its MGF doesn't converge above 0, while the MGF for the Gamma does converge in a neighborhood around zero.

--

** As Nick Cox mentions below, the usual transformation to approximate normality for the gamma, the Wilson-Hilferty transformation, is weaker than the log - it's a cube root transformation. At small values of the shape parameter, the fourth root has been mentioned instead see the discussion in this answer, but in either case it's a weaker transformation to achieve near-normality.

The comparison of skewness (or kurtosis) doesn't suggest any necessary relationship in the extreme tail - it instead tells us something about average behavior; but it may for that reason work better if the original point was not being made about the extreme tail.


Resources: It's easy to use programs like R or Minitab or Matlab or Excel or whatever you like to draw densities and log-densities and logs of ratios of densities ... and so on, to see how things go in particular cases. That's what I'd suggest to start with.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • +1 though, knowing the two distributions have the same mean and variance, I would have thought that the top diagram did suggest that the blue log-normal line was more leptokurtic (if it is more peaked and slender in the centre then it will probably have fatter tails) – Henry Feb 13 '14 at 08:20
  • 4
    Indeed it does suggest that, but there's no necessary relationship between peakedness, heavy-tailedness and kurtosis; there are counterexamples to such expectations, so we must beware. The second plot confirms the suspicion though. – Glen_b Feb 13 '14 at 08:22
  • 5
    Here's a one-liner. It's a definition that log transformation is needed to make a lognormal normal; it is a good approximation that a cube root makes a gamma normal (Wilson-Hilferty are two words for the wise); the distribution needing the stronger transformation is "further" from the normal or Gaussian. – Nick Cox Feb 13 '14 at 09:43
  • @Nick That's pretty good; it's related to the discussion of skewness, since the two transformations are the usual 'symmetrizing' transformations for those distributions. I was just in the last few days discussing Wilson-Hilferty in [another answer](http://stats.stackexchange.com/questions/86135/is-it-possible-to-convert-a-rayleigh-distribution-into-a-gaussian-distribution/86143#86143). – Glen_b Feb 13 '14 at 10:01
  • 2
    @Glen_b I am just adding a little decoration to a very nice-looking cake of yours. – Nick Cox Feb 13 '14 at 11:52
  • 1
    @Nick I cannot see how your approach is relevant, because transformations to (approximate) normality describe the *center* of the distribution, not the tails. For instance, we can contaminate a normal distribution with a tiny amount of a two-sided lognormal (a lognormal distribution of $|X|$). An excellent transformation to achieve approximate normality is the identity which--being "weaker" than the cube root--would seem to imply that this contaminated normal has lighter tails than the gamma, but that's not true. – whuber Feb 13 '14 at 18:05
  • @whuber It's just a one-liner or rule of thumb for comparing gamma and lognormal, which was the question. It matches experience roughly too. I said nothing about other distributions. It is clearly true that all sorts of skewness and kurtosis combinations can occur, not to mention other complications and pathologies, such as you imagine here. – Nick Cox Feb 13 '14 at 18:39
  • @Nick But because this rule of thumb has no mathematical justification it does not even apply to the Gamma and Lognormal distributions! – whuber Feb 13 '14 at 19:07
  • 1
    @whuber That log of lognormal yields normal is a matter of definition. Do you disagree? Which part of the Wilson-Hilferty argument do you consider irrelevant, incorrect or insufficiently rigorous? – Nick Cox Feb 13 '14 at 19:15
  • 2
    @Nick Cox I don't disagree with the statements about transformations. The mathematically illegitimate part is the conclusion you attempt to draw: from the fact that a logarithm makes the lognormal normal and a cube root makes a gamma approximately normal, you cannot draw *any* conclusion about the tails of either one. – whuber Feb 13 '14 at 19:19
  • 2
    Thanks; your point is clearer to me, but I stick by my "rule of thumb" wording, and invoke experience too. Clearly, I don't have a theorem. – Nick Cox Feb 13 '14 at 19:32
  • A comment on Glen_b's statement, "Indeed it does suggest that, but there's no necessary relationship between peakedness, heavy-tailedness and kurtosis". This statement is true as regards peakedness, but false as regards heavy-tailedness, unless you choose to define "heavy-tailedness" in one particular way (which as I noted above, is kind of silly). Kurtosis is mathematically related to tails, and not the peak, according to three mathematical theorems I published in my paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321753/ – BigBendRegion Nov 20 '17 at 00:37
7

Although kurtosis is a related to the heaviness of tails, it would contribute more to the notion of fat tailed distributions, and relatively less to tail heaviness itself, as the following example shows. Herein, I now regurgitate what I have learned in the posts above and below, which are really excellent comments. First, the area of a right tail is the area from x to $\infty$ of a $f(x)$ density function, A.K.A. the survival function, $1-F(t)$. For the lognormal distribution $\frac{e^{-\frac{(\log (x)-\mu )^2}{2 \sigma ^2}}}{\sqrt{2 \pi } \sigma x};x\geq 0$ and the gamma distribution $\frac{\beta ^{\alpha } x^{\alpha -1} e^{-\beta x}}{\Gamma (\alpha )};x\geq 0$, let us compare their respective survival functions $\frac{1}{2} \text{erfc}\left(\frac{ \log (x)-\mu}{\sqrt{2} \sigma}\right)$ and $Q(\alpha ,\beta x)=\frac{\Gamma (\alpha , \beta x)}{\Gamma (\alpha )}$ graphically. To do this, I arbitrarily set their respective variances $\left(e^{\sigma ^2}-1\right) e^{2 \mu +\sigma ^2}$ and $\frac{\alpha }{\beta ^2}$, as well as their respective excess kurtoses $3 e^{2 \sigma ^2}+2 e^{3 \sigma ^2}+e^{4 \sigma ^2}-6$ and $\frac{6}{\alpha }$ equal by choosing $\mu =0, \sigma =0.8$ and solved for $\alpha \to 0.19128,\beta \to 0.335421$. This shows 1-F(x) for LND in blue and GD in orange

the survival function for the lognormal distribution (LND) in blue and the gamma distribution (GD) in orange. This brings us to our first caution. That is, if this plot were all we were to examine, we might conclude that the tail for GD is heavier than for LND. That this is not the case is shown by extending the x-axis values of the plot, thus 1-F(x) for LND and GD longer graph

This plot shows that 1) even with equal kurtoses, the right tail areas of LND and GD can differ. 2) That graphic interpretation alone has its dangers, as it can only display results for fixed parameter values over a limited range. Thus, there is a need to find general expressions for the limiting survival function ratio of $\lim_{x\to \infty } \, \frac{S(\text{LND},x)}{S(\text{GD},x)}$. I was unable to do this with infinite series expansions. However, I was able to do this by using the intermediary of terminal or asymptotic functions, which are not unique functions and where for right hand tails then $\lim_{x\to \infty } \, \frac{F(x)}{G(x)}=1$ is sufficient for $F(x)$ and $G(x)$ to be mutually asymptotic. With appropriate care taken to finding these functions, this has the potential to identify a subset of simpler functions than the survival functions themselves, that can be shared or held in common with more than one density function, for example, two different density functions may share a limiting exponential tail. In the prior version of this post, this is what I was referring to as the "added complexity of comparing survival functions." Note that, $\lim_{u\to \infty } \, \frac{\text{erfc}(u)}{\frac{e^{-u^2}}{\sqrt{\pi } u}}=1$ and $\lim_{u\to \infty } \, \frac{\Gamma (\alpha ,u)}{e^{-u} u^{\alpha -1}}=1$ (Incidentally and not necessarily $\text{erfc}(u)<\frac{e^{-u^2}}{\sqrt{\pi } u}$ and $\Gamma (\alpha ,u )<e^{-u} u^{\alpha -1}$. That is, it is not necessary to choose an upper bound, just an asymptotic function). Here we write $\frac{1}{2} \text{erfc}\left(\frac{\log (x)-\mu }{\sqrt{2} \sigma }\right)<\frac{e^{-\left(\frac{\log (x)-\mu }{\sqrt{2} \sigma }\right)^2}}{\frac{2 \left(\sqrt{\pi } (\log (x)-\mu )\right)}{\sqrt{2} \sigma }}$ and $\frac{\Gamma (\alpha ,\beta x)}{\Gamma (\alpha )}<\frac{e^{-\text{$\beta $x}} (\beta x)^{\alpha -1}}{\Gamma (\alpha )}$ where the ratio of the right hand terms has the same limit as $x\to \infty$ as the left hand terms. Simplifying the limiting ratio of right hand terms yields $\lim_{x\to \infty } \, \frac{\sigma \Gamma (\alpha ) (\beta x)^{1-\alpha } e^{\beta x-\frac{(\mu -\log (x))^2}{2 \sigma ^2}}}{\sqrt{2 \pi } (\log (x)-\mu )}=\infty$ meaning that for x sufficiently large, the LND tail area is as large as we like compared to the GD tail area, irrespective of what the parameter values are. That brings up another problem, we do not always have solutions that are true for all parameter values, thus, using graphic illustrations alone can be misleading. For example, the gamma distribution right tail area is greater than the exponential distribution's tail area when $\alpha < 1$, less than exponential when $\alpha >1$ and the GD is exactly an exponential distribution when $\alpha =1$.

What then is the use of taking the logarithms of the ratio of survival functions, since we obviously do not need to take logarithms to find a limiting ratio? Many distribution function contain exponential terms that look simpler when the logarithm is taken, and if the ratio goes to infinity in the limit as x increases, then the logarithm will do so as well. In our case, that would allow us to inspect $\lim_{x\to \infty } \, \left(\log \left(\frac{\sigma \Gamma (\alpha ) (\beta x)^{1-\alpha }}{\sqrt{2 \pi } (\log (x)-\mu )}\right)+\beta x-\frac{(\mu -\log (x))^2}{2 \sigma ^2}\right)=\infty$, which some people would find simpler to look at. Lastly, if the ratio of survival functions goes to zero, then the logarithm of that ratio will go to $-\infty$, and in all cases after finding the limit of a logarithm of a ratio, we have to take the antilogarithm of that value to understand its relationship to the limiting value of the ordinary ratio of survival function.

Edit 2020-02-18: BTW, there is a lot of literature on classifying tail heaviness of functions that, in effect, assumes (incorrectly) that one can compare hazard functions while ignoring the requirement for having an indeterminate form to do so. There does not seem to be much literature in support of the methods of survival function comparison outlined herein, at least that I could find. However, there is a recent publication appendix that may be cite-worthy. Any other references for the methods outlined herein would be greatly appreciated.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 2
    In this case (and quite often in cases of interest) higher kurtosis corresponds to heavier tail, but as a general proposition this is not the case - counterexamples are easy to construct. – Glen_b Dec 29 '15 at 08:56
  • 1
    1. I don't know of any general way short of directly comparing the tails. 2. What is it that's more complicated? whuber's answer shows us why there's a problem with looking at anything but the survivor function (for the right tail); he discusses why you can't compare pdfs in detail but similar points carry over to kurtosis. Further, comparing $S(x)=1-F(x)$ is often much less complicated than comparing kurtosis as well. (In the left tail you'd compare $F(x)$ directly but that wasn't an issue for this question.) – Glen_b Dec 29 '15 at 12:35
  • For an example of the sort of problem the kurtosis can present, see the distributions [here](http://stats.stackexchange.com/questions/154951/non-normal-distributions-with-zero-skewness-and-zero-excess-kurtosis/154965#154965). The first example is heavier tailed than the normal, the second example lighter tailed, yet both have the same kurtosis as the normal. – Glen_b Dec 29 '15 at 12:47
  • 2
    I also note that you say "This has something to do with a moments theorem that says that if (all of?) the moments of two distributions are equal, then the distributions are identical." -- even if *all* moments of two distributions are equal, the distributions are not necessarily identical. Counterexamples are discussed in answers to several questions here on CV. You need more than just all moments equal -- you need the MGF to exist in a neighborhood of 0. – Glen_b Dec 29 '15 at 12:53
  • Comparing survivor functions to measure tail weight is just one of infinitely many ways to measure the tail. This discussion misses the practical point entirely. The reason people are concerned about the tail is that they are concerned about outliers. Outliers are extremely important in statistical inference, but the question of whether survival functions cross as some point near infinity is not important. Kurtosis measures tails via the average of the Z^4 values, which is relevant for all kinds of statistical inference, including power of means tests and precision of variance estimates. – BigBendRegion Nov 20 '17 at 00:48
  • Also, to be clear, Glen_b's comment "In this case (and quite often in cases of interest) higher kurtosis corresponds to heavier tail, but as a general proposition this is not the case" only is valid if you use a particular definition of heaviness of tail. As I noted above in comments, this definition is not generally relevant because there are distributions with finite tails that are "heavy-tailed." Again, the practical concern of tailedness is outliers. You don't need to have a distribution with infinite support to have outliers. – BigBendRegion Nov 20 '17 at 00:51
  • 1
    @PeterWestfall Semi-infinite support is often assumed, for example, as $0\leq t< \infty$ for drug concentrations in blood plasma. In that case, tail-heaviness would determine whether the mean residence time of drug in the body measures anything (e.g., exponential distribution) or not (e.g., some Pareto distributions). – Carl Nov 20 '17 at 01:21
  • Ok, Carl, fine, but I don't think you understand the point of this discussion. You can have density crossings like my example where the heavier-tailed distribution has infinite support, but has less in the tail than the N(0,1) distribution far out in the tail. Just take my example with uniforms and mix it with a N(0,0.0001), with small mixing p on the Normal, and there you go. The resulting distribution still has heavier tails in the sense that people care (namely, outliers), but the N(0,1) density exceeds it for large x. – BigBendRegion Nov 20 '17 at 02:36
  • 1
    @PeterWestfall I do get your point, similar to http://nma.berkeley.edu/ark:/28722/bk000471p7j. It is incumbent to recall that every distribution implies different measures for different things. For example, the average extreme value is MVUE for location of a uniform distribution, not the mean, and not the median. Between those extreme values, the tails are heavy, but outside of them, the tails are zip. What that has to do with a higher moment like kurtosis, when the first moment is not MVUE I would not venture to guess. Something, maybe, but what? – Carl Nov 20 '17 at 05:02
  • 1
    @Carl, I do not agree with the comment "Between those extreme values [of the uniform], the tails are heavy ...". That seems to be a misinterpretation of "heavy tails" since the uniform distribution has light tails. As far as what all this has to do with kurtosis, excess kurtosis is negative for the uniform, as expected for a light-tailed distribution. I also don't think you need to introduce estimation theory (MVUE etc.) here, as kurtosis is a property of the probability distribution, irrespective of any estimate of it. – BigBendRegion Jan 01 '18 at 00:02
  • 1
    @PeterWestfall Agreed. You may be interested in exploring [Fat tailed distributions](https://en.wikipedia.org/wiki/Fat-tailed_distribution). – Carl Jan 01 '18 at 00:26
  • Thanks, @Carl. I have been looking for such definitions. But that Wikipedia "definition" of fat tails is not viable. By that definition, the .9999U(-1,1) + .0001U(-1000, 1000) is not fat tailed. Anyway, why do we care about "fat tails"? We care because of outlier potential. My mixture of uniforms example clearly has severe outlier potential, but does not satisfy the Wikipedia "definition." That is why we need alternative definitions of "fat tails." – BigBendRegion Jan 01 '18 at 00:36
  • 1
    @PeterWestfall You are on to something. Why not ask it as a question to nail down a definition that is viable? I only brought up kurtosis because I started talking about it in the history of this answer, and, got clobbered by my usual shower of downvotes that I seem to attract like bees to honey. – Carl Jan 01 '18 at 00:41