Can there be a meaningful standard deviation for qualitative data?
-
1If we consider car colors like "red" "blue" and "green" qualitative, then the standard deviation of a car color is meaningless. If you reinterpret the colors as wavelengths, say, the maximum in the spectrum of reflected white light, then sure, you can have a standard deviation of car color. But then it isn't qualitative anymore. So the simple answer is "no" there is no meaningful standard deviation for strictly qualitative data. – Peter Leopold May 27 '19 at 22:14
-
3For standard deviation to make sense, you'd first need "mean" to make sense. Does that make sense for whatever kind of qualitative data you're talking about? – Glen_b May 28 '19 at 02:01
1 Answers
Comment: No. But there are ways to describe how 'scattered' or 'diverse' the categories are. Perhaps see Wikipedia on 'Diversity index'.
One especially simple method is the Simpson index $\lambda = \sum_i^R p_i^2,$, where there are $R$ categories, with respective probabilities $p_i,$ for $i = 1,2, \dots,R.$
This amounts to "the probability that two entities taken at random from the dataset of interest represent the same type," under sampling with replacement. (The index attains its minimum $1/R$ when all categories are equally likely.)
With any of these indexes, it is a good idea to try them out on several datasets of the kind that interest you, to see if the results make intuitive sense for your application. And to see what the realistic maximum and minimum possible values are.
Personal example: Some years ago while giving a guest lecture on randomization at a small religious college in Nebraska, I noticed I was the only person in the room of 20 people who did not have blue eyes. Before my arrival, Simpson's index for eye color was $\lambda = 1;$ after, $\lambda \approx 0.91.$

- 47,896
- 2
- 28
- 76