I found a question that is related here, but it doesn't really goes on what I want to know.
I found a couple of papers using Kappa Statistic from 2006, and 2010, but afterwards I found other authors disagreeing with it like here and here. I also noticed other tests being listed and suggested like the ICC on the question pointed on cross validated I pointed, and on the wiki.
To narrow down the context: I have a set n of participants which consistently were asked to say 'yes/no/maybe' on a set of questions. I've seen authors also changing the yes/maybe to 'evidence' and converting it to yes/no by just summing the quantities of yes and maybe.
My questions thus are:
- What is the most recent/accepted or referenced paper in respect to this discussion?
- Given my dataset constraints, which test would be more appropriated? I was considering Fleiss but when I saw it was a generalization of the pi-statistic and that the Kappa came out of criticism to pi i got confused. If you could suggest an R function that could do it I would also appreciate.
Thank you.