I'm trying to analyze the intra-rater reliability of an occupational therapy assessment.
The assessment consists of 57 items, where some items are ratio data and some items are ordinal data. In my study 31 raters rated 4 subjects (on video) twice (time 1 and time 2) with a six week interval. So I have scores on each of the 57 items, for each subject, for each time and each rater.
Rater1Item 1, Rater1Item2 .... Rater1Item 57.... Rater31Item1....Rater31Item57
Subject1 Time1
Subject2 Time1
....
Subject3 Time2
Subject4 Time2
My first question is about the set-up of the database, I'm using SPSS. As showed above, is this the right set-up or do I need to change the set up and put the raters in the rows and the subjects in the colomns?
My second question, I found out I can use ICC and Bland & Altman to analyse the intra-rater reliability. Are these two suitable or are there other statistical methods? The kappa statistic seems not very suitable since I have 31 raters, each rating 4 subjects. Or is there a way to deal with the kappa statistic and such a big database?
My third question is, which ICC can I use for the ratio data to calculate the intra-rater reliabiliy? I read different suggestions in the literature (ICC 1.1 or ICC 3.1)
My fourth question is, if I use Bland & Altman the first thing I need to do is calculating the mean of time 1 and time 2. Do I need to do this for each item, each subject and each rater independently or can I calculate the mean for each item on each subject for all raters together? Or wouldn't it make sense to calculate this for the raters together, because the mean on both times can be the same, without have agreement (for example rater 1 scores less on time 2 and rater 2 scores higher on time 2, keeping the mean the same on both time 1 and time 2)
My last question, when I draw the Bland & Altman plot can I do this for each item with all 4 subjects and all 31 raters, having 124 dots in my plot?