I have a number of series that would typically be described as normal skewed or Gamma distributed. For example, say I have a group of customers and have calculated their spend over a fixed length of time. I then create a histogram to see distribution of spend and find an extremely long tail for the small group of high-spenders. My question is, since I want to identify these high spenders, are there methods to empirically inspect a distribution of values and approximate the point at which the distribution becomes "long-tailed" to create a cut in the data? I am not looking at inspecting a histogram to find the long tail. Just looking for a consistent method to systematically cut the data.
Asked
Active
Viewed 3,114 times
11
-
2You might appreciate [this recent answer](http://stats.stackexchange.com/questions/58220/what-distribution-does-my-data-follow/58241#58241) showing ways to identify modes in a distribution. Also of interest is [this thread](http://stats.stackexchange.com/questions/40454/determine-different-clusters-of-1d-data-from-database) on 1D clustering. Searching our site on "clustering" is also likely to be fruitful. – whuber May 06 '13 at 20:40
-
What do you mean by "...the point at which the distribution becomes 'long-tailed'..."? Are you asking if there is a way to quantify the thickness of the tail of a distribution? If so, extreme value theory may be of help. – rbatt May 07 '13 at 00:58
-
whuber - judging by the first 5 min of browsing the links you provided, I believe you have pointed me in the right direction. Thanks a bunch! – Adam L May 07 '13 at 20:30
-
If you found a solution, maybe you can post an answer teling us what you finally did! – kjetil b halvorsen Jan 27 '19 at 11:57
-
What needs to be done to make sense of the data depends on the data itself. For example, uni-modal or bimodal data would be treated differently. – Carl Aug 16 '19 at 00:12