How to analyze intra-rater reliability with 31 raters who all rated 4 subjects twice

Question

I'm trying to analyze the intra-rater reliability of an occupational therapy assessment.

The assessment consists of 57 items, where some items are ratio data and some items are ordinal data. In my study 31 raters rated 4 subjects (on video) twice (time 1 and time 2) with a six week interval. So I have scores on each of the 57 items, for each subject, for each time and each rater.

                Rater1Item 1, Rater1Item2 .... Rater1Item 57.... Rater31Item1....Rater31Item57

Subject1 Time1
Subject2 Time1
....
Subject3 Time2
Subject4 Time2

My first question is about the set-up of the database, I'm using SPSS. As showed above, is this the right set-up or do I need to change the set up and put the raters in the rows and the subjects in the colomns?

My second question, I found out I can use ICC and Bland & Altman to analyse the intra-rater reliability. Are these two suitable or are there other statistical methods? The kappa statistic seems not very suitable since I have 31 raters, each rating 4 subjects. Or is there a way to deal with the kappa statistic and such a big database?

My third question is, which ICC can I use for the ratio data to calculate the intra-rater reliabiliy? I read different suggestions in the literature (ICC 1.1 or ICC 3.1)

My fourth question is, if I use Bland & Altman the first thing I need to do is calculating the mean of time 1 and time 2. Do I need to do this for each item, each subject and each rater independently or can I calculate the mean for each item on each subject for all raters together? Or wouldn't it make sense to calculate this for the raters together, because the mean on both times can be the same, without have agreement (for example rater 1 scores less on time 2 and rater 2 scores higher on time 2, keeping the mean the same on both time 1 and time 2)

My last question, when I draw the Bland & Altman plot can I do this for each item with all 4 subjects and all 31 raters, having 124 dots in my plot?

Welcome to Cross Validated Stack Exchange (CV SE). Usually, it helps if you ask one question at a time. Narrowing down your doubts increases your chances to get a good answer. For more information about CV SE see the [help center](http://stats.stackexchange.com/help) page. Tks. — Andre Silva, Nov 07 '13 at 15:25
This question is quite tricky, so I start with just a tiny comment: Bland-Altman-plots are designed to study the agreement between two different *methods*, not different *raters*. In daily work this is essential because the raters will change. (Or put differently, the raters you are using can be considered as a random draw of all possible raters whereas in Bland-Altman settings, the methods are fixed). — Michael M, Nov 07 '13 at 15:27

How to analyze intra-rater reliability with 31 raters who all rated 4 subjects twice

0 Answers0