inferring heavy-tail distribution from finite sample of histogram data

Question

I have some data in the form of bins and counts. Here is one complete non-truncated example:

bins: 60-100, 100-200, 200-300, 300-500, 500-1000, 1000-2000, 2000-3000, >3000 
counts: 275, 320, 112, 65, 53, 44, 16, 15

Counts is the number of objects with sizes given by corresponding bin (60-100, 100-200 etc). If we construct the probability density function (dividing counts by bin size and scaling to total counts), we get some decreasing exp-like dependence. I have a hypothesis (and the nature of the objects implies so), that this may be the right-hand tail of a skewed Gaussian with its center somewhere out of measurements scope (for bins around 10 or less). But there is another idea, that this function may come from a long-tailed (or heavy-tailed) statistics.

(Update in clarification). Under "heavy-tailed" the stable distribution law is meant (in Levy sense) different from Gaussian, which in its intuitive concept involves: infinite dispersion or higher moments and asymptotics as x^-n instead of exp(-x^2).

Is it possible on the basis of these data only (I have several other such objects) to discriminate between exponential decrease and long-tail behaviour? Perhaps, with stating some confidence and probability of such hypothesis. What are tests which could answer this question and how they can be implemented preferably in R and/or in Mathematica? (other languages suit as well, if the solution exists therein).

In favour of this hypothesis is that the amount of matter for each bin (that is \int f(x)*x dx) increases with increased bin number, which seems to be not the case, if we had purely Gaussian tail.

If the sample is too sparse, and no conclusion can be drawn with reasonable confidence, how can I estimate the amount of needed data (number of bins and/or bin sizing) which are sufficient for this inference?

The skew normal (or skew Gaussian) is one kind of long tailed distribution. Exponential decrease is another kind of long tailed distribution. The data are clearly long tail; do you want to test one long-tail distribution vs. another? If so, which two? — Peter Flom, Aug 20 '17 at 11:25
The first thing is to clearly show the discrimination between long-tailed (~x^n) and short-tailed exponential (exp(-x^n)). As next thing, maybe to try to infer the class of long tails (n in x^n asymptotics). But I wonder, whether exponential decrease belongs to long-tailed? Perhaps, there is a terminology mismatch. I learned that "long-tailed" refer to distributions with infinite moments and x^(-n) assymptotics. As opposed to "short-tailed", to which Gaussian belongs - see, e.g. http://ufn.ru/en/articles/2003/8/c/ — astrsk, Aug 20 '17 at 11:41
To me, "long tailed" just means ... well ... having a long tail! It needn't have infinite moments. But that terminology difference would explain my confusion. — Peter Flom, Aug 20 '17 at 12:18
@Peter There's an explicit definition for "long tailed" here --- [Heavy-tailed distribution: Definition of long-tailed distribution](https://en.wikipedia.org/wiki/Heavy-tailed_distribution#Definition_of_long-tailed_distribution) (though you can sometimes find other definitions as well as it being used in a more general sense). It's not clear that the OP intends this particular one though -- it should be made explicit in the question — Glen_b, Aug 20 '17 at 12:40
I did not distinguish "long-tailed" and "heavy-tailed" in the context as in the above article. So maybe right way to broaden the wording and to speak about "heavy-tailed". For me ever "heavy" (or "long-"?) meant stable distributions (in Levi sense), but different from the Gaussian. This necessarily implies infinite moments and x^-n asymptotics. I never came into such details as this "long-tail" subclass — astrsk, Aug 20 '17 at 14:00

Ben · Answer 1 · 2021-09-26T11:13:20.237

There are lots of ways to define "heavy tails" depending on which specific property is important, but a common definition is when the tails are heavy enough for the variance to be infinite. (This meaning is relevant in probability theory because it vitiates application of the central limit theorem.) In any case, it is possible to use the data to get an idea of the likely shape of the tails, though this always requires extrapolation beyond the data range.

Heavy-tailed distributions (infinite variance): Consider a density function $f$ with tails that decay according to the power-law:

$$f(x) \rightarrow c x^{-\omega-1} \quad \quad \text{as } x \rightarrow \infty.$$

From this form is it easy to show that:

$$\int \limits_x^\infty (r-\mu)^2 f(r) dr = \mathcal{O}(x^{2-\omega}),$$

so the upper-end of the integral converges to a finite value only if $\omega>2$. Hence, the variance of the distribution will be finite so long as both tails decay at least as fast as a power-law greater than cubic decay. It is important to note that heavy-tailed distributions only arise when the variable of interest is unbounded. If the variable of interest is bounded to some finite region then the variance must be finite and so no problem arises.

Looking at tail behaviour in the data: We can use the observed sample values in the tail of the distribution to see if this appears to be the case for our data. From the form of a density function with power-law decay, it is easy to show that the log-tail-probability and log-mean deviation are related in the tail by:

$$\ln \mathbb{P}(X>x) \rightarrow \text{const} - \omega \ln(x-\mu) \quad \quad \text{as } x \rightarrow \infty.$$

Letting $x_{(1)} \geqslant \cdots \geqslant x_{(n)}$ be the ordered sample values we can estimate the log-tail-probability by $\ln \hat{\mathbb{P}}(X>x_{(i)}) = \ln(2i-1)- \ln(2n)$. For the values occurring in the tails of the distribution we should therefore expect the values to follow the relationships:

$$\begin{matrix} \text{Right tail } & & & & \quad \quad \quad \ln(2i-1) \approx \text{const} - \omega \ln|x_{(i)}-\bar{x}_n|, \\ \text{Left tail} \quad & & & & \text{ } \ln(2(n-i)-1) \approx \text{const} - \omega \ln|x_{(i)}-\bar{x}_n|. \\ \end{matrix}$$

(Note that we are taking logarithms so we only look at the values on the appropriate side of the mean. This is reasonable since we are only interested in the tail values in each tail.) In order to diagnose whether there is evidence of heavy tails in the distribution we can construct tail plots which show the logarithmic terms in these approximate equations, and we can look at the slope of the relationship to estimate the value $\omega$ for each tail. The values in the tail-plots are given by:

$$\begin{matrix} \text{Vertical axis (Right tail) } & & & & \quad \quad \quad \ln(2i-1) \\ \text{Vertical axis (Left tail)} \quad & & & & \text{ } \ln(2(n-i)-1) \\ \text{Horizontal axis } \quad \quad \quad \quad & & & & \quad \quad \ln|x_{(i)}-\bar{x}_n| \end{matrix}$$

Here is an example of this kind of plot for some data from a distribution with lower-bounded support but no upper bound:

From these tail plots we see that both tails appear to be decaying faster than cubic decay. For the left-tail we already know it is not heavy-tailed (since it is bounded), but it is comforting that this is reflected in the plot. For the right-tail we can see that it is fairly close to cubic decay, but it does appear to be faster. In interpreting these plots we concentrate on the values that are further into the tails, which are the values closer to the right-hand-side of each plot. It is important to note that a larger sample would show more values in the tails which might show a different rate of decay.

Not sure how "common" the "infinite variance" definition of heavy-tailedness is, but it is too limited a definition to be of practical use for many situations, just like the "exponential tails" definition is too limited. Heavy-tailed distributions are used to model processes that produce occasional rare, extreme values. Such processes can easily have bounded support; see my example above. — BigBendRegion, Oct 06 '18 at 12:00
I find that the commonness of the definitions depends on the commonness of use of the theorems that depend on those properties. Since finite variance is a requirement of the standard version of the CLT, it is quite commonly applied, and thus, its conditions are quite commonly used. In that context the relevant meaning of "heavy tails" is the one above. I think we agree that in other contexts a different meaning is used. — Ben, Oct 06 '18 at 23:19
Agreed, I am after a notion of "heavy tails" that is useful in practice to model real processes that produce occasional, extreme values. Others will have different notions. My main concern is that people believe what they see on the internet, and therefore buy into narrow definitions provided, eg., on Wikipedia, as if they are definitive and comprehensive. Then they repeat these notions as if they are firmly established and generally accepted. Perhaps we need a different term, like "models for processes that produce occasional extreme values," rather than "heavy-tailed" models. — BigBendRegion, Oct 08 '18 at 13:08
Then presumably they will read my first sentence: "There are lots of ways to define "heavy tails" depending on which specific property is important...". — Ben, Oct 08 '18 at 23:23
Yes, that's a good caveat, my apologies. Unfortunately, the current Wikipedia entry on "Heavy-Tailed Distribution" has no such caveat and is quite dogmatic in its statement of precisely what "heavy-tailed" means (and, by implication, what it does not mean). It is problematic because people (even Ph.D.s who post here) think that the Wikipedia entry gives the precise, accepted definition. — BigBendRegion, Oct 09 '18 at 10:44
That's true, but on the other hand, maybe it would be a good thing if one of the definitions became established as "the" definition of heavy-tailed. Then we could re-name the other meanings and we would then have a clearer nomenclature in the discipline. If statisticians have failed to provide an unambiguous lexicon of their own terminology then others will fill the void! — Ben, Oct 09 '18 at 22:07
Good idea! Whatever you can come up with for "heavy-tailed," please do it. I would then suggest "outlier-prone distributions" as a broader category within which heavy tailed distributions would be a subset. What I envision as the class of "outlier-prone distributions" would include distributions that have finite variance like lognormal and t(2.1), even distributions with finite support like .9999U(-1,1) + .0001U(-1000, 1000). I realize "outlier" is somewhat of a loaded term, but "extreme value distribution" is already taken! Other suggestions for naming this broader class are most welcome. — BigBendRegion, Oct 10 '18 at 23:20

score 1 · Answer 2 · answered Sep 22 '18 at 02:36

There are no agreed-upon definitions for long- and heavy-tailed. The power law definitions are often given, but they are too restrictive for practical purposes. The whole point of using such distributions is to model processes that produce occasional, extreme observations far in excess of what you would expect using the normal distribution. Such processes need not obey power laws or even have infinite support. For example, mix a U(-1,1) with a U(-10000, 10000), with mixing p=.0001 on the latter: This distribution has finite support, no power law, but is extremely heavy tailed. But using the power law definition, it is arguably "light-tailed."

Kurtosis measures (moment- and quantile-based) are useful alternative definitions of heavy-tailedness that do not have the limitations of the power law definitions.

Thus, as regards to the OP's data analysis question, I would suggest fitting the lognormal distribution using interval censored maximum likelihood, and then use the kurtosis of that fitted distribution as a measure of its heavy-tailedness. As far as the first category, you might question whether the interval is really 60-100 or 0-100. If 60 is really a hard-and-fast lower bound, then you need to modify the distribution. Otherwise, use 0-100 when fitting the lognormal.

inferring heavy-tail distribution from finite sample of histogram data

2 Answers2

Linked