How to use Cullen and Frey graphs for downstream statistical analysis?

Question

I do not see any difference in normalized and raw data Cullen and Frey plots!

Raw data

summary statistics
------
min:  15.79852   max:  23.55614 
median:  18.76461 
mean:  18.99634 
estimated sd:  1.157783 
estimated skewness:  0.7445161 
estimated kurtosis:  3.325692

Normalized data

summary statistics
------
min:  15.78514   max:  23.58933 
median:  18.76017 
mean:  18.99634 
estimated sd:  1.157583 
estimated skewness:  0.7482378 
estimated kurtosis:  3.33266

1. "*I do not see any difference in normalized and raw data Cullen and Frey plots!*" -- Naturally not, since skewness and kurtosis is independent of location and scale. They're the third and fourth moments of standardized variables, so in a plot of kurtosis vs skewness-squared any linear rescaling will give the same plot. You should understand the properties of your tools before trying to use them!. 2. It's not clear to me what you mean by "downstream". — Glen_b, Mar 19 '19 at 04:26
Normalization can mean various things, but none of those I can recall result in the minimum, median, maximum, skew, kurtosis being slightly different and the mean and SD identical to 7 sig. fig. So what did you do? — Nick Cox, Mar 21 '19 at 15:55

score 1 · Accepted Answer · answered Mar 23 '19 at 15:30

First, the Cullen & Frey plot shows $\text{skewness}^2$ versus kurtosis, which are third and forth standardized moments, so standardizing the data (in the usual sense of subtracting mean and dividing by standard deviation) makes no difference.

You didn't tell us much about your modeling goals, neiter about your data (sample size? What does it represent? ...) so we canot say much about "what you should do downstream", but at least, you should do as @glen_b says and understand your tools better (and tell us more about your problem.

But your plot: The big blue dot represents your data, and it stays away from the points representing normal and exponential distributions ..., and also away from the lines representing gamma and lognormal. Projecting vertically to those lines, you can see that you have higher kurtosis than a gamma/lognormal of the same skewness. So maybe try some extended gamma, or log-skewnormal?

But, your plot misses some important information for judging this: The uncertainty in the sample estimates of skewness/kurtosis. That you can evaluate by bootstrapping, see my answer here for example and code.

How to use Cullen and Frey graphs for downstream statistical analysis?

1 Answers1