How can I determine if categorical data is normally distributed?

Question

Is it true that a normality check should be used for continuous data only (ratio, interval level of measurement) and not for categorical data (nominal, ordinal)?
Is there any way to check the normality of categorical data?

Glen_b · Answer 1 · 2019-02-21T08:18:21.660

Categorical data are not from a normal distribution.

The normal distribution only makes sense if you're dealing with at least interval data, and the normal distribution is continuous and on the whole real line. If any of those aren't true you don't need to examine the data distribution to conclude that it's not consistent with normality.

[Note that if it's not interval you have bigger issues than those associated assuming a distribution shape, since even the calculation of a mean implies that you have interval scale. To say that "High" + "Very Low" = "Medium" + "Low" and "Very High" + "Medium" = "High" + "High" (i.e. exactly the sort of thing you need to hold to even begin adding values in the first place), you are forced to assume interval scale at that point.]

It would be somewhat rare to have even reasonably approximate normal-looking samples with actual ratio data, since ratio data are generally non-negative and typically somewhat skew.

When your measures are categorical, it's not so much that you can't "check" it as it generally makes no sense to do it - you already know it's not a sample from a normal distribution. Indeed, the idea of even trying makes no sense in the case of nominal data, since the categories don't even have an order! [The only distribution invariant to an arbitrary rearrangement of order would be a discrete uniform.]

If your data are ordered categorical the intervals are arbitrary, and again, we're left with a notion we can't really do much with; even simpler notions like symmetry don't really hold up under arbitrary changes in intervals.

To begin to contemplate even approximate normality means we must at least assume our categories are interval / have fixed, known "scores".

But in any case, the question "is it normal?" isn't really a useful question anyway - since when are real data truly sampled from a normal distribution?

[There can be situations in which it could be meaningful to consider whether the ordered categories have an underlying (latent) variable with (say) a normal distribution, but that's quite a different kind of consideration.]

A more useful question is suggested by George Box:

Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.

(I believe that's in Box and Draper, along with his more well known aphorism.)

If you had discrete data that was at least interval, and had a fair number of categories, it might make sense to check that it wasn't heavily skew, say, but you wouldn't actually believe it to be drawn from a normal population - it can't be.

For some inferential procedures, actual normality may not be especially important, particularly at larger sample sizes.

But how can I check normality for nominal categorical data that is requred for z-test for proportions? Here it says that it should be standard normal distribution: https://newonlinecourses.science.psu.edu/stat414/node/268/ — vasili111, Nov 06 '19 at 20:05
Don't confuse the categories with the *counts* of values in those categories. A set of categorical responses like "red, blue, pink, blue..." can't be normal. *However* counts within categories is a different story. Specifically set of *counts* in categories may (given some simple assumptions) be modelled as a multinomial distribution which if the expected counts are not too low can be well approximated as a (degenerate) multivariate normal. With a z-test for proportions - 2 outcomes - the count in either outcome (given the assumptions) will be binomial (& so approximately normal with large n). — Glen_b, Nov 06 '19 at 22:33

How can I determine if categorical data is normally distributed?

1 Answers1

Linked

Related