How to measure agreement between a set of raters?

Question

I am running an experiment where I have a finite set of raters and a finite set of items, and raters have to provide their subjective judgment about each item. The goal is to measure the importance of those items for the raters.

For every item, each rater uses a Likert-like scale (1= Unimportant, 2= Of Little Importance, 3=Moderately Important, 4=Important, 5=Very Important)

Knowing that the judgments are subjective I want to measure how raters agree in their ratings, and eventually observe new patterns in theirs judgments.

The question is: Which statistical method/tool is more appropriate for such an analysis?

Of possible interest: [Inter-rater reliability for ordinal or interval data](http://stats.stackexchange.com/q/3539/930). — chl, Aug 17 '12 at 10:59

score 2 · Accepted Answer · answered Jul 15 '12 at 19:29

2

Cohen's kappa is the most common statistic to test interrater agreement. There is another version of it due to Fleiss. There is also a weighted kappa and the intraclass correlation that is used.

answered Jul 15 '12 at 19:29

Michael R. Chernick

39,640
28
74
143

Thanks @Michael for your suggestion. After googling for it, i found the Kappa method. But It seems that it is used when you evaluate raters' judgments against a Standard. But in my case there is no standard as every judgment is completely subjective. – Jul 15 '12 at 20:33
Kappa measures interrater agreement. There is a rating system assumed like your Likert scale. That is all that is meant by comparison to a standard. You need to have a score to know if there is complete agreement or some degree of disagreement. I have used it many time to do exactly what you want. – Michael R. Chernick Jul 15 '12 at 20:46
According to the doucmentation i found on the subject it seems that there is the Kappa Method, used for Qualitative Nominal data (with No logical order between scaling categories) and the Intraclass Correlation Coefficient (ICC) Method, used for qualitative Ordinal data (With a logical order between scaling categories. eg Grading). I wonder if the ICC Method isn't more appropriate for my situation, as i have a logical order to the categories ... – Jul 15 '12 at 21:13
In the two category case there is no distinction. If you have 3 or more categories and want to incorporate the ordering kappa will not do that. – Michael R. Chernick Jul 15 '12 at 21:25
I'll try Kappa's Method, based on a nice example i found on wiki. May be it would be interesting to see if i obtain similar results with ICC. Thanks @Michael for your help. – Jul 15 '12 at 21:51
Would ICC work given the small sample? – Cesare Camestre Jul 19 '13 at 13:29

score 1 · Answer 2 · answered Nov 14 '12 at 14:22

1

Consider using Krippendorff's Alpha.

answered Nov 14 '12 at 14:22

Dr. Jochen L. Leidner

51
1

4

Welcome to the site, Dr. Leidner. We appreciate your offering your expertise to help w/ answering questions here. One of CV's goals is to create a permanent, self-contained repository of statistical information. In that light, we prefer that answers don't need supplementary info to be complete. Would you mind editing your answer to say a little about what Krippendorff's alpha is, & why it is the appropriate tool here? (You needn't say too much.) These issues are discussed in our [FAQ](http://stats.stackexchange.com/faq), which you may want to read. – gung - Reinstate Monica Nov 14 '12 at 14:49

How to measure agreement between a set of raters?

2 Answers2

Linked