We have an experimental setup as follows:
A set of subjects each repeat a single task multiple times in a single session, yielding a set of measurements. The same subjects then repeat the same task multiple times in a second session a week later. They are unaware of the results each time and there is unlikely to be any learning effect contributing to the results.
We have previously analysed the results from a single session, accounting for the within-subject repetition, by fitting linear mixed models with differing fixed effects and comparing them using likelihood ratio tests (as described in the excellent answers here). We now wish to investigate the reliability of the results. [Side-note]
The best reference I have found for this is [1], which gives a good description and is supported by the CorrMixed R package. They investigate reliability in terms of the measurement time, whilst treating the session (or 'cycle' in their terms) as a fixed effect. We are less interested in the measurement time (for repeated measurements within a session) and more interested in the reliability between sessions.
What is the correct measure for reliability in our case? (ideally with reference to the nlme or CorrMixed R packages)
[1]: Estimating the reliability of repeatedly measured endpoints based on linear mixed-effects models. A tutorial, Wim Van der Elst, Geert Molenberghs, Ralf-Dieter Hilgers, Geert Verbeke, and Nicole Heussen, https://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/10.1002_pst.1787.pdf
[Side-note]: This has been referred to as the 'Intra-Rater Reliability' in the trial notes; however I believe this is a confusing title, as variability in the measurements comes from the subjects themseves, rather than anyone rating them. To me 'Test-Retest Reliability' seems a more accurate title for what we wish to measure. Intra-Rater Reliability in the literature often seems to be considered as a subset of Test-Retest Reliability, but I would be interested in which is the correct term for what we are looking at (or whether it just comes down to semantics?).
Example:
Session 1 Measurements
Subject A: 2, 3, 2
Subject B: -1, -1, -2
Subject C: 4, -2, 2
Subject D: 4, 4, 5
Session 2 Measurements
Subject A: 4, 3, 3
Subject B: -1, -3, -1
Subject C: 0, -1, 3
Subject D: 3, 3, 3