2

I am working on the following problem and would really appreciate your help:

The data set at hand contains medical ratings (scale 1-5, assumed interval) by 5 independent raters, on 7 occasions, for 60 patients. That is, generally speaking it looks like this:

Patient  Rater  Time Rating
1          1     1      2
1          2     1      3
1          3     1      2
1          4     1      3
1          5     1      2
1          1     2      2
...

Now I learned that I can calculate the ICC as a reliability estimate for the raters. I did this with lme4 like this:

model <- lmer(Rating ~ (1|Patient) + (1|Rater) + Time, data=df)

This yielded the following variance estimates

Random effects:
 Groups          Name        Variance Std.Dev.
 Patient         (Intercept) 0.50755  0.7124  
 Rater           (Intercept) 0.01535  0.1239  
 Residual                    0.55580  0.7455  

Thus, the ICC for Rater should be 0.015 / (0.508+0.015+0.556) = .014. This also replicates when I use the rptR::rpt function. So I made no immediate mistake in calculations. However, I am wondering: Why is the reliability so extremely low? As a check, I then transformed the data to wide format and calculated correlations between the individual raters and also estimated reliability using alpha and omega. Here, I obtain very high coefficients (.90+). This is more among the lines of what I expected...

I suspect that there is an issue with how I specified the model... I am interested in the agreement between raters for any given patient at any given timepoint. Can anyone point out my mistake?

Many thanks

  • 1
    If I'm not mistaken, the value $0.015/(0.508+0.015+0.556)=0.014$ is the estimated correlation of two measurements made by the same rater on two different patients. This seems intuitive to me because I'd expect the measurements of different patients to be independent. The value $0.508/(0.508+0.015+0.556)=0.471$ is the estimated correlation of two measurements on the same patient performed by different raters. I followed the calculations shown [here](https://stats.stackexchange.com/a/81455/21054). Final note: You can't estimate intra-rater reliability because you fitted time as fixed effect. – COOLSerdash Feb 03 '21 at 12:14
  • Thank you for your quick response. And wow, such an obvious mistake by me ;) So to make sure: If I were to add time as a random effect instead of fixed, then I could also estimate inter-rater reliability as: `(Var(Patient) + Var(Time)) / Var(Total)` ? Or even add the interaction Patient*Time as well? – Bjarne Schmalbach Feb 03 '21 at 19:26
  • That would be the estimated correlation of two measurements by two different raters at the same time and on the same person, I'd say. – COOLSerdash Feb 03 '21 at 20:40
  • 1
    Great! Thanks again. Now the estimates are more along the lines of what I would have expected – Bjarne Schmalbach Feb 03 '21 at 20:48

0 Answers0