acceptable ICC for 6-point Likert scale

Question

I have pairs of pictures of participants making facial expressions (240 pairs per participant, 11 participants). Two raters agreed on a criterion and rated each one of the pairs on a scale from 1 to 6, according to how similar the two expressions within the pair were.

When I calculate the intraclass correlation coefficient for each participant (consistency, not absolute agreement) I get values of around .5. The points do more or less cluster around the diagonal (indicating ok-ish agreement) but it's clearly not great.

Below is an example plot of the ratings for one participant, with jitter on the x axis to show density of points.

Normally, very high inter-rater reliability is necessary to show that the raters are rating something meaningful. However, getting high agreement on a 6-point scale is proving hard.

What is an acceptable ICC in a 6-point Likert scale? Is .5 really low, or the expectable standard?

Note: for reasons of experimental design, I can't change the rating scale.

Likert scales are named for Rensis Likert, hence the spelling and capitalisation, as edited (not "lickert"!). (Not all scales so named would qualify in his terms, but I will leave that one.) — Nick Cox, Jul 21 '15 at 11:06

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

The simple answer is no; there are no "acceptable" values, and you have to choose some value that you find acceptable. In some areas agreement between raters is higher, in some it is lower: it depends. For example, I know research (unpublished) where raters' rating quality of student essays in general had very different ratings. It was so even if you compared the raters who had created the rating scale that was used in the research ("expert" raters)... On the other hand, with a simple task you could expect very high agreement. The best you can do is to review the literature and check what was the agreement in similar research and base your decision on that. In a similar case, deciding when correlation coefficients are "high" or "low", there also is no general agreement.

acceptable ICC for 6-point Likert scale

1 Answers1