Should I use an average to summarize ordinal data?

Question

I need to work out a "average" (for lack of knowing a better word) of ratings or perhaps I could call them labels. Basically, I have a list of words that have been rated 1 - 3 for difficulty. 1 being easy, 2 moderate and 3 difficult. The list is marked by 5 individuals. I need to use the words that are "mainly" rated as 2. So the individuals did not always agree on the difficulty. If a word had the following "score": 1, 1, 2, 3, 3 then it's safe to say that it's agreeably a 2.

If most reviewers rated a word as 3 then it's obviously not on average a 2. Basically, when taking 5 reviewers' rating for a word, what is the average rating?

Now it makes sense that I want to add it all up and divide it by 5. This will mostly give me decimal values such as 1.6 or so on.

Now my question finally is. If doing so, getting values such as 1.6, what do I make of it? Round it to the nearest integer and take that as the deciding rating? Is it as simple as that?

Have you considered taking the mode instead? It might make sense here — shadowtalker, May 25 '16 at 17:21
@ssdecontrol Well, I honestly know nothing about stats. But if you reckon that is what I need in this case, I'll use it. Thanks for the comment! If you post it as an answer I will upvote/mark it for reputation points for you. Thanks a bunch! :) — amateurjustin, May 25 '16 at 17:24
I suggested a more explicit title for your question -- feel free to roll back the edit if you don't like it. — shadowtalker, May 25 '16 at 17:36
I don't know if your example is a safe to say that it is agreeably 2. Mostly I believe one could argue that the larger number of votes on either end would be disagreeable with a 2. (I could be splitting hairs though.) — Kitter Catter, May 25 '16 at 17:42
Related threads: http://stats.stackexchange.com/questions/67551/calculate-mean-of-ordinal-variable http://stats.stackexchange.com/questions/74113/when-is-the-median-more-affected-by-sampling-error-than-the-mean — Nick Cox, May 25 '16 at 18:27
Also related: [What are good basic statistics to use for ordinal data?](http://stats.stackexchange.com/q/97/7290) — gung - Reinstate Monica, May 25 '16 at 19:00

gung - Reinstate Monica · Accepted Answer · 2016-05-25T20:15:00.680

This is largely an issue for you to decide based on your theoretical assumptions about the data and what lies behind them. When you calculate an arithmetic average, you are assuming that the intervals are reasonably similar. (That is, you are implicitly stating that $3-2 = 2-1$ and $3-1 = 2\times (3-2)$.) If you believe that is a reasonable assumption, and others in your field (e.g., reviewers) are likely to agree with you, then it's fine. Using means with ordinal data tends to be more defensible when:

There are a larger number of ordinal levels (a rule of thumb is $\ge 12$);
the ordinal levels are composed of many components (e.g., ratings for many related questions are aggregated into a composite); and/or
the raters were instructed / tried to make the ratings equal interval.

It isn't clear to me that those hold in your case, but it is for you to decide.

You also should think hard about what you mean by '"mainly" rated as 2'. Again, that is for you to decide. However, I would not think of the set of ratings $\{1,1,2,3,3\}$ as "mainly" being $2$, despite the fact that the mean is $2$. I would interpret that as being a somewhat polarizing word, with some thinking it's 'easy' and some thinking it's 'hard'. But again, this is a theoretical issue for you to decide.

For what it's worth (almost certainly very little), if it were me, I would think your ratings were not amenable to be described by means. I think I would interpret '"mainly" rated as 2' as the majority of raters gave this word a 2. That is, I would select words that received $>50\%\ \rm ``2\!"$s.

By contrast, I suspect that you don't only want to select individual words '"mainly" rated as 2', but also want the entire set of selected words to be rated $\approx 2$. To check that aspect, I would feel more comfortable using the mean of all the ratings for all the selected words (or the mean of the words' means). At this point, you are averaging over many more ratings and I think the mean would be more defensible.

I am in complete agreement with your latter two paragraphs. One needs to be careful about having a bunch of 1's and 3's that balance out to a two. — Kitter Catter, May 25 '16 at 19:49
@KitterCatter, yes, I noticed after posting that you had made the same point already at the end of your answer (+1). — gung - Reinstate Monica, May 25 '16 at 20:09
Thank you, both! @KitterCatter and @gung! I am opting for the > 50%. I think your advice about 1's and 3's balanced as a 2 not being quite right is good advice. I must say, I learnt quite a lot through all the answers and comments! — amateurjustin, May 26 '16 at 14:10

score 3 · Answer 2 · answered May 25 '16 at 17:22

3

It is often the case with discrete data that the mean is non-discrete. That doesn't mean that it isn't a worthwhile statistic to report. A value of 1.6 could be interpreted as easy for most, but intermediate for some - pretty much what you see in your table.

If you want integers, you can calculate the median, which is the value such that half of all observations are below it and half above it. For example for kanarie, the median of (1,1,2,2,2) is 2, since half are below 2 and half are above (or equal).

Alternatively is the mode, which is just the most common value. It is often useful to get a feel for the overall population's choice.

answered May 25 '16 at 17:22

Forgottenscience

1,186
6
10

The median could still be a non-integer if there are an even number of data points and the middle falls between boundary points. – Peter K. May 25 '16 at 17:29
1

(+1) It's worth considering, though, whether you want to treat *easy*-*moderate*-*difficult* as a interval scale in the first place. Taking means implies you're happy to say that the difference between *easy* & *moderate* is equal to the difference between *moderate* & *difficult*. – Scortchi - Reinstate Monica May 25 '16 at 17:31
1

@PeterK. you could round the median in the same way that the OP suggested rounding the mean -- my gut tells me that would be "less wrong" but I don't have a good justification for it – shadowtalker May 25 '16 at 17:37
No I'm wondering if the mode is really going to help. Given the values (1,1,2,2,3), the mode turns out to be 1. It doesn't seem right to go with that seeing as 3 out of 5 people say that it's moderate-difficult, not easy. – amateurjustin May 25 '16 at 17:37
1

[link]http://stats.stackexchange.com/questions/31598/is-amazons-average-rating-misleading?rq=1 seams to get to the same problem – amateurjustin May 25 '16 at 17:54
2

No; with 1,1,2,2,3 there are two modes, 1 and 2. This is either a limitation of the mode that it's often not unique or just a feature of data like this. Or both. Positively put, the mode is most useful when it's well defined. See also @Martin's answer. – Nick Cox May 25 '16 at 18:24
Thinking about it, is median then not the only reasonable answer? – amateurjustin May 25 '16 at 18:46
1

I'd suggest that there are other answers that can be considered reasonable depending on what exactly you are trying to do. Using median may lead to some polarized results being represented as in the middle which could be bad for your use. – Kitter Catter May 25 '16 at 20:30

score 2 · Answer 3 · edited May 25 '16 at 18:26

You state that you "need to use the words that are 'mainly' marked as 2."

I'd argue that you need to define what 'mainly' actually means to you.

One way is to take the mean, as you indicate intuitively makes sense to you. If you do, then, indeed, if the mean is: 1.5 ≤ mean < 2.5. Then you might interpret it as 'mainly' 2.

An alternative is to use the mode, as @ssdecontrol suggested. You would, however, need to decide on how to handle a situation where there are two modes: e.g. (1,1,2,2,3) where the modes are 1 and 2. Would you consider this to pass your criteria? Or your example of (1,1,2,3,3) where the modes are 1 and 3. The mode is not 2 in this case, but the mean is exactly 2. Does it pass your criteria?

A third option is to use the mean of the mode. Where the pass criterion could be: 1.5 ≤ mean of mode < 2.5.

I think that might be the best option yet! The mean of the mode! — amateurjustin, May 25 '16 at 18:09

Kitter Catter · Answer 4 · 2016-05-25T18:56:36.670

I think you have a few options depending on what you care about. You can go with the mean and see if it falls within some range of 2. This would be like averaging your votes. The advantage here is that you can check on how balanced people's votes are.

If you care more about how the votes land with at least 1/2 voting more or less at some point you would want to go with the median. The advantage of the median is that it is based more on vote distribution.

You might be interested in where the most voters voted. This would be the mode. You used the word mainly so you might be interested in this metric. Advantage is where the most voted, disadvantage is that as you get more options this can mean less.

One thing to keep in mind is also that you may have some controversial words where most voters are 1 and 3. In this case you should be careful since most would not say 2, but some methods would give you 2. I would thus suggest you use mode, in the case of two modes you would reject or use another method such as median.

e: You could also look into percent of votes that are 2. If it is above 50% then mainly people believe the question is a 2.

E: If you really want to add complexity you could also try to address the issue of voters not agreeing and having a bias. You would then need to do some transformation of the vote and do a mean/median.

"[mean] would be like averaging your votes." Sorry for the basic interjection, but in all my classes mentioning stats, pretty much the first thing they drummed into us was that the mean is just one type under the umbrella category of averaging methods, and that using the latter when we meant the former was a rookie mistake from colloquial use. Is this a regional difference? — underscore_d, May 25 '16 at 22:53
I wrote this for a lay audience so I used the colloquial terminology. — Kitter Catter, May 26 '16 at 00:33

Should I use an average to summarize ordinal data?

4 Answers4