I am writing with questions about a study I am helping with:
Two “raters” will score athletes subjectively from 0 to 3 (3>2>1>0) on each of 10 skill performances. Scores will be summed to a maximum of 30, and the total bracketed into approximately three categories for comparison with regard to knee injury incidence. Reliability/agreement between the two raters’ scores must be determined.
How should the inter-rater reliability testing be set up? Kappa and/or Kendall? Same data for individual skills and total score? How much data is needed for this irr testing?
The lead investigator anticipates highly consistent scores between raters and plans to divide their testing of subjects accordingly, for the main study.