3

This is a question about how to understand qqplot results. To present the problem, I have a set of 54 measurements from archaeological objects. I calculated both kurtosis and skewness with the Moments package and I plotted the data with ggplot, to assess if they followed a normal distribution.

> library(ggplot2)
> library(moments)

> volume
[1]  39.984  77.280  81.252  98.304 113.520 190.190 440.220 739.500 750.120  35.100  57.960 170.586 225.262 574.308 584.100 609.280 711.360 746.928 186.576
[20]   1.500   1.512   1.890   1.950   5.280   7.200   7.200   9.200   9.280  24.752  39.528  49.880  67.620  73.950 342.000 401.625 468.000 512.818 565.250
[39]  29.040  80.344  68.370  88.830 121.176 128.800 133.200  18.000  69.312  89.880 180.264 265.680 412.720 638.400 680.400 506.000

> volume <- as.data.frame(volume)
> kurtosis(volume)
  volume 
2.280249
> skewness(volume)
   volume 
0.8915582

> ggplot(volume) + geom_qq(aes(sample=volume)) + stat_qq_line(aes(sample=volume)) + theme_bw() + xlab("Theoretical") + ylab("Sample") + ggtitle("Quantile-Quantile")

QQplot

The shape of the curve seems very odd, and I'm not sure how to understand it. Does it simply shows that these are, in fact, samples from two populations? Or maybe I'm trying a calculation that doesn't make sense? I've been looking for similar cases over the internet, but couldn't find any.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
gvanhavre
  • 131
  • 3
  • Does this help https://stats.stackexchange.com/questions/111010/interpreting-qqplot-is-there-any-rule-of-thumb-to-decide-for-non-normality?rq=1 ? – Tim May 25 '21 at 11:27
  • I particularly liked the rule of thumb "I am happy to call a distribution non-normal if I think I can offer a different description that is clearly more appropriate". My curve seems like a very extreme version of those presented in the 24 random sets. – gvanhavre May 25 '21 at 12:03
  • 1
    You should always think about what your data *are*, & how they arose & were gathered & measured. Given that your variable is called "volume", I wonder if these are [volumes](https://en.wikipedia.org/wiki/Volume). In addition, there is an obvious floor effect that the values asymptote to 0. The normal distribution goes to negative infinity, whereas volumes cannot be negative. Thus, if they are volumes, they cannot be normal by definition. Also perhaps of interest is that there may be a ceiling effect. Do you have any thoughts on whether that might be true and why? – gung - Reinstate Monica May 25 '21 at 14:11
  • 1
    "Measurements of archeological objects" are almost surely positive -- but *pace* @gung, that does not rule out the use of a Normal distribution to model and even describe them. There's no reason to think that a set of volumes could not be approximately Normally distributed (although, in practice, there are many reasons to suppose their *cube roots* might have a near Normal distribution). The issue here is that, as you suspect, the distribution is clearly bimodal and perhaps ought to be considered a set of measurements from two populations. – whuber May 25 '21 at 15:36
  • @gung-ReinstateMonica Indeed they are maximum cuboids from three morphological measurements. In fact my question arose because each of these measurement, separately, follows a more straight normal line. They gave way to this shape once combined. But you're right, of course, about the floor and ceiling effects. After all, they are all, ultimately, objects that have to be manipulated by hand... – gvanhavre May 25 '21 at 20:21
  • 1
    Hmm, that's intereseting @gvanhavre. I wonder if it would make more sense to work with the three individual measurements. Are they just length, width, & height? What is it that you ultimately want to do with / know about these data? Often, people think it's important that they be normal, but that unnecessary or even irrelevant. The floor effect is common, but the ceiling effect is more subtle (& potentially interesting, depending on what you're doing). Transforming to normality would be much more straightforward, if the ceiling effect didn't exist. – gung - Reinstate Monica May 25 '21 at 21:23

1 Answers1

3

Negatives can be much easier than positives in this territory, and here are two:

  1. In principle and in practice, this distribution is not plausibly normal. The numerical skewness and kurtosis results are less emphatic than the clear graphical asymmetry on your plot and the fact that values must be positive, which constraint is biting hard. A variable that is positively skewed will tend to plot as convex down on a normal quantile plot. (FWIW, quantile plot or quantile-quantile plot is a much wider class than normal quantile plot.)

  2. It's a happy accident if data are close to a brand-name or named distribution, but the fit to any such can often be poor. One reason among several why fits can be disappointing is if any kind of heterogeneity is producing a mixture.

Something like a volume I would expect to be closer to a gamma or lognormal, but my explorations were not especially encouraging. Just to show some results, I played with how far a square root, cube root or logarithmic scale might help.

enter image description here

I see a slight edge to working with a cube root scale.

It's suggestive if not definitive to mention that the cube root of a volume is simple dimensionally, as a length, and that, to a good approximation, the cube root of a gamma distribution is a normal distribution. At the same time, it seems likely that your objects have a minimum volume rather above zero.

All that said, it's interesting if data are close to any named distribution, but contrary to myth normal distribution is an ideal condition (often mis-stated as an assumption) for methods only rarely.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
  • 2
    +1 for emphasizing that seeking Normality is not a good objective. A Normal probability plot merely uses Normality as a reference point, much as one might use the horizon as a reference point to determine how high or low an object in the field of view might be, without any implication that all objects are *supposed* to be on the horizon. – whuber May 25 '21 at 15:38
  • 1
    How nice there is a good philosophical side in the question and thanks for addressing it. @whuber, that's right, and the horizon example is very clear and clever. – gvanhavre May 25 '21 at 20:35