2

In my system I ask $n$ people to rate [1,5] $m$ objects (e.g. films). I'd like to get a measure of the inter-subject concordance, i.e. how much the users agreed overall, not just on a particular object but on all of the objects.

I know that Kendall's W is a well-established measure of concordance on rankings, but in my case people are not ordering objects, but assigning a 1-5 score. From Kendall's perspective, this results in a lot of ties, making Kendall's W problematic.

What is a suitable measure of inter-subject concordance for ratings rather than for rankings?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Mulone
  • 295
  • 2
  • 11

2 Answers2

2

In this case usually the intraclass coefficient is used. It is a measure of interrater agreement, which takes both profiles and elevation into account. There are several ICCs; in your case I would recommend to use the ICC(2,k). This is the appropriate index if all raters rate all films, and raters are selected randomly.

Felix S
  • 4,432
  • 4
  • 26
  • 34
0

One way of doing it is to compute standard deviation for each film (as e.g. in https://stats.stackexchange.com/a/23254/6552) and average it over films (perhaps using the numbers of voters as the weights).

Other approach is measure average correlation (Pearson or Spearman) between their rates. Advantage (or disadvantage, depending what you want to do) of this approach is that it excludes simple shifts. E.g. if one person likes the same films as the other, but just casts 1 mark lower, it means that they agree.

Piotr Migdal
  • 5,586
  • 2
  • 26
  • 70
  • Thanks! I understand how to use Pearson between 2 users, but what about the whole set of users? I need a global measure of concordance, not just between 2 users. – Mulone Feb 21 '12 at 22:48
  • Use it between all pairs of users (now you got $n(n-1)/2$ correlation coefficients), then average it. – Piotr Migdal Feb 21 '12 at 23:42