1

How do I know whether a distribution is leptokurtic or platykurtic by only having the box plot?

  • 1
    By using robust statistics to determine its geometry, a boxplot tries as hard as possible to focus attention on properties of the data that are *not* based on moments! Thus, one appropriate answer would be, *if you are looking at boxplots, you should be asking different questions of the data.* Boxplots were designed to help you *summarize* a dataset succinctly by looking at its *location,* its *spread,* its *skewness,* and highlighting any data points that are not suitably described by those quantities. – whuber Feb 04 '22 at 15:41

2 Answers2

0

I find boxplots highly misleading for assessing tails, so I would not do this. In particular, the “obvious” way to assess kurtosis is to consider how many “outlier” points there are, but it means nothing to have, say, $200$ outliers on the plot. If there are $200$ outliers in a sample of $500$, maybe it’s fair to consider the tails heavy. If there are $200$ outliers in a sample of $5000$, perhaps the tails are not so heavy. However, the boxplot gives no sense of what proportion of points are extreme, just the count.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • Usually boxplots are accompanied by indications of how many data they portray, because this is important for interpreting them. – whuber Feb 04 '22 at 15:42
  • @whuber That’s an improvement on just giving the plot, but I do find it hard to unsee a bunch of dots way out in the tail(s), even if there’s an “N=a billion” in the caption. – Dave Feb 04 '22 at 15:45
  • 1
    I agree. If one wishes to assess kurtosis, then, what is needed is a different visualization method altogether. For instance, fitting a reference distribution (one with zero fourth cumulant, *aka* excess kurtosis, would be ideal) and comparing that to the data *via* a histogram, probability plot, or whatever, would be far more effective. Among such reference distributions, the Normal is the obvious candidate :-). – whuber Feb 04 '22 at 16:18
0

Higher kurtosis is indeed indicated by outliers in a box plot. However, it is not the proportion of outliers that determines kurtosis. Instead, the leverage exerted by the outliers (as determined by larger $|z|$-scores) precisely determines kurtosis. So you can have fewer outliers, but with more extension, that also results in higher kurtosis. Precise statements of this fact are given here and here.

But the box plot does not correspond as precisely to kurtosis as does the normal quantile-quantile plot: There is a direct mathematical and visual connection between the normal q-q plot (and its detrended version) and excess kurtosis that is explained here.

Those who are still promote the incorrect "peakedness" interpretation of kurtosis might suggest that boxplots are inappropriate because they do not show the peak clearly. Here is a counterargument: In a boxplot of 1000 random beta(.5,1) values, shown below, there are no outliers, correctly suggesting low kurtosis (this distribution is less kurtotic than gaussian). But this distribution is infinitely peaked, which cannot be discerned from the boxplot.

Peakedness does not determine kurtosis, and vice versa.

enter image description here

BigBendRegion
  • 4,593
  • 12
  • 22
  • The question is not about "peakedness" or its (lack of) relationship to kurtosis. The initial discussion, which does look helpful and on point, needs qualification, because it's easy to construct realistic-looking boxplots with outliers where the data have low kurtosis (negative excess kurtosis). Take a mixture of, say, a uniform$(0,1)$ sample of size $500n$ and a uniform$(3/2,2)$ sample of size $n.$ Although the kurtosis is near 2.375, it has a huge chance of at least $n$ boxplot outliers. In `R`: `n – whuber Feb 05 '22 at 16:42
  • Right, it's all about the leverage. I think I phrased it carefully enough not to require an edit – BigBendRegion Feb 05 '22 at 17:15
  • The counterexample makes your initial assertion "Higher kurtosis is indeed indicated by outliers in a box plot" incorrect. If instead you moderated "is indeed" to "can be," it would be more accurate. But one is left wondering whether the following statements are either just tautological or useless, for how are we supposed to *see* "leverage" in the boxplot? – whuber Feb 05 '22 at 21:32
  • I dunno, it seems clear to me. At least the math is correct. Will consider, though. – BigBendRegion Feb 05 '22 at 22:40
  • The problem is that the box hides so much, we cannot tell in many cases whether the outliers have much "leverage" or not. – whuber Feb 05 '22 at 22:54