I am collecting data from a survey that includes a likert scale of 5 points. Each rater answers 18 questions (probably different questions). For each question I will have answers from 10 different raters. How can I calculate the interrater reliability, so that I can eliminate a rater who was giving random answers?
A rough example of the data, where x is the rater answer.
One possible way, on a question level, calculate the difference between one rater's answer and the average of the answers of the other 9 raters for the same question, and have some threshold to eliminate raters. Or I can use intraclass correlation (ICC(1,k)) but how can I know which raters to eliminate? Or are there any other possible ways?
any help would be great,thanks,