Test-Retest reliability for mixed models

Question

We have an experimental setup as follows:

A set of subjects each repeat a single task multiple times in a single session, yielding a set of measurements. The same subjects then repeat the same task multiple times in a second session a week later. They are unaware of the results each time and there is unlikely to be any learning effect contributing to the results.

We have previously analysed the results from a single session, accounting for the within-subject repetition, by fitting linear mixed models with differing fixed effects and comparing them using likelihood ratio tests (as described in the excellent answers here). We now wish to investigate the reliability of the results. [Side-note]

The best reference I have found for this is [1], which gives a good description and is supported by the CorrMixed R package. They investigate reliability in terms of the measurement time, whilst treating the session (or 'cycle' in their terms) as a fixed effect. We are less interested in the measurement time (for repeated measurements within a session) and more interested in the reliability between sessions.

What is the correct measure for reliability in our case? (ideally with reference to the nlme or CorrMixed R packages)

[1]: Estimating the reliability of repeatedly measured endpoints based on linear mixed-effects models. A tutorial, Wim Van der Elst, Geert Molenberghs, Ralf-Dieter Hilgers, Geert Verbeke, and Nicole Heussen, https://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/10.1002_pst.1787.pdf

[Side-note]: This has been referred to as the 'Intra-Rater Reliability' in the trial notes; however I believe this is a confusing title, as variability in the measurements comes from the subjects themseves, rather than anyone rating them. To me 'Test-Retest Reliability' seems a more accurate title for what we wish to measure. Intra-Rater Reliability in the literature often seems to be considered as a subset of Test-Retest Reliability, but I would be interested in which is the correct term for what we are looking at (or whether it just comes down to semantics?).

Example:
Session 1 Measurements
Subject A: 2, 3, 2
Subject B: -1, -1, -2
Subject C: 4, -2, 2
Subject D: 4, 4, 5

Session 2 Measurements
Subject A: 4, 3, 3
Subject B: -1, -3, -1
Subject C: 0, -1, 3
Subject D: 3, 3, 3

I think "test-retest" is a poor choice here. It refers to the reliability of measurement, rather than of statistical analysis. (And welcome to Cross Validated). — Peter Flom, Jul 22 '18 at 11:16
It is reliability of measurement that we are interested in, perhaps my question was not clear enough on that? — welf, Jul 22 '18 at 11:25
If it is reliability of measurement that you are concerned with, then none of the material on multilevel models is needed. I'm confused. — Peter Flom, Jul 22 '18 at 12:46
'repeat a single task multiple times'. Is that just random repetition? That is, is there any systematic difference between time1 and time2 or not? Are "times" a fixed factor or are they just attempts? — ttnphns, Jul 22 '18 at 12:51
Repetition is multiple attempts at the same task, therefore (within-subject) attempts are treated as a random effect in our mixed model. This is why we have been using a mixed model - is this still necessary to look at reliability? — welf, Jul 23 '18 at 22:10

Test-Retest reliability for mixed models

0 Answers0