My histogram suggests my data is not normally distributed but the stats suggest it is, can i assume normality or not?

Question

I am looking as likelihood to forgive and need to decide what correlation to use to apply this scale to attachment type.

My histogram shows a slight positive skew, but the test of normality (Kolmogorov-Smirnov) and skewness and kurtosis suggest normality. Can I assume normality and use a Pearson correlation or do I use Spearman rank correlation?

Glen_b · Answer 1 · 2021-10-29T23:44:04.077

Failure to reject normality does not imply that you have it, only that the sample size was too small to detect the non-normality you have.

Formal hypothesis tests of assumptions like this are not especially useful; they answer the wrong question. The real question is nearer to one of effect size, not of significance.

Looking at skewness and kurtosis (not testing them) is coming closer to looking at effect size (though they are not always especially informative) but in any case, your assumptions should not normally be based on what you find in the sample you're trying to perform a test on.

Whether you're using a test or a diagnostic, if it's on the same data you're going to use in the original test, looking at the data in this way affects the behavior of the tests you're choosing between.
The assumption for the usual Pearson test is of bivariate normality; you're not assessing that if you're only looking at the marginal distributions (testing the variables individually). However, you can also justify the usual Pearson test in a different way (if you're in a regression-type situation), which would be based on conditional normality of the response -- again you can't assess it by a simple test of normality of the response.

Which is to say, even if tests did help, you'd simply be testing the wrong thing.
While the usual Pearson test does involve some assumption relating to normality, you don't have to do that test.

What is crucial is the assumption of linearity. If you're really interested in linear correlation, the Pearson is the most obvious choice; at that point you can then figure out how to test it (if a test is really what you need). For example, if the null is of zero correlation, you could do a nonparametric test of the Pearson correlation (e.g. via a permutation test, or as Nick Cox mentions, a bootstrap test; in large samples this will work just fine even for test non-zero null correlations, which the permutation test doesn't). If you're interested in more general association, you may be better to choose some other measure.

It's also worth noting that the usual test of a Pearson correlation is reasonably level-robust to a variety of departures from bivariate normality; if the bivariate distribution is one where a linear correlation still makes sense it may in many cases work reasonably well -- it can pay to use simulation to investigate its sensitivity to such deviations. For myself (on the rare occasions I'd use a hypothesis test at all), I'd usually tend to go to a nonparametric test, though.

Consistently with this, I think, the major use of Pearson correlation, I contend, is to **measure** how far the relationship is linear (with a bonus that the sign $+$ or $-$ tells you the sign of the relationship), Normality is not even essential for **inference** as you could bootstrap; conversely in situations in which the bootstrap is dubious all other tests are dubious too. — Nick Cox, Oct 29 '21 at 14:56

My histogram suggests my data is not normally distributed but the stats suggest it is, can i assume normality or not?

1 Answers1