Participants were rated twice, with the 2 ratings separated by 3 years. For most participants the ratings were done by different raters, but for some (< 10%) the same rater performed both ratings. There were 8 raters altogether, with 2 doing ratings at both time points.
Now, since the ratings were of an aspect of ability with a hypothetical "correct" value, then absolute agreement between raters is of interest, rather than consistency. However, since the ratings were taken 3 years apart, there might have been (and probably was) some real change in the ability.
- What would be the best test of reliability in this case?
- I'm leaning towards an intra-class correlation, but is ICC1 the best I can do with these data?