8

In my field, a dated gold standard dataset is used to track progress in algorithm development. Now when the state-of-the-art algorithms obtain higher correlation than is the inter-rater agreement of the dataset, there is concern whether the dataset can still be used. What is your opinion?

Some more details:

Let there be a gold standard dataset D1 created by averaging ratings of 13 annotators. The ratings are on scale 0 to 10. The interannnotator agreement is 0.6 as computed as the average of pairwise Spearman correlations between the ratings of all raters.

Say the state-of-the-art algorithm obtains Spearman correlation of 0.8 with the gold standard. Since this exceeds the interannotator agreement of 0.6, the algorithm is better than the humans. Can the gold standard still be used to track progress in algorithm development in the given field, or is a new dataset with higher interannotator agreement needed?

I would especially appreciate any references to literature dealing with this topic.

chl
  • 50,972
  • 18
  • 205
  • 364
tomas
  • 361
  • 2
  • 6
  • Just out of curiosity: How does one obtain different annotations from a single algorithm to compute the pairwise correlations? – ziggystar Aug 07 '15 at 09:41
  • Did you get any updates on this, tomas? What are you current thoughts? – michal Sep 20 '17 at 09:37
  • Unfortunately no. I have not made any progress myself on this. – tomas Sep 23 '17 at 17:59
  • @tomas Can you give me some good citations of the idea that, once an algorithm reaches IAA on a dataset, the dataset can not be used any more to track development, and a new dataset with high IAA is needed? – Russell Richie Feb 17 '22 at 19:05
  • Also, to answer your question, you seem to accept the possibility that an algorithm can beat IAA on a dataset, so why can't another algorithm beat IAA on that dataset *even more*? Unless you think that beating IAA is noise or an artifact. If that's the case, check out Boguslav and Cohen (2017), who argue that it's perfectly reasonable for algos to beat IAA. – Russell Richie Feb 17 '22 at 19:08

0 Answers0