Although eRDS had good test-retest reliability, learning occurred in the first sessions. The large improvement between T1 and T2 together with the proportion of participants found stereoblind at T1 and yet showing measurable stereopsis in the following sessions (or when using ASTEROID and VV at T1), suggests that the initial practice session was not sufficient for participants to fully understand and accomplish the task at T1. Large and rapid improvements occurring at the start of a task are usually explained by learning of the task and material (task familiarization), while slow and gradual improvement reflects perceptual learning.
54–57 In particular, the observed improvement could be due to learning how to see depth in the stereoscope, e.g., how to cope with the accommodation-vergence conflict in the stereoscope. We minimized the conflict as much as possible by equalizing the accommodation and vergence distances of the screen, but there is still a small conflict when stimuli are presented in front or behind the screen. The stereoscope has been demonstrated to be responsible for a decrease in the initial precision of depth perception that can be improved with training.
58,59 Therefore we believe that the improvement from T1 to T2 was due to task familiarization (e.g., learning how to use the stereoscope). Residual learning was also observed between T2 and T3, the reasons for this later improvement being less clear. eRDS sessions accumulated 340 trials per session and 1020 by the end of T3 (counting trials with 2000 ms and 200 ms presentations). If perceptual learning was at play, it would also have been expected between all subsequent sessions. Yet our post-hoc analyses revealed no improvement between those last sessions, although this result might have been limited by our relatively small sample size. It is possible that the early fast phase of perceptual learning accounts for the progression between T1 and T3, with T4 to T6 being too few trials to allow for the slower phase of perceptual learning to be expressed. Alternatively, additional task familiarization may have occurred between T2 and T3.
Learning effects after multiple testing have been observed frequently in the literature. Gardiner and colleagues,
60 in a paradigm testing the visual field of patients with early glaucoma once a year, found a learning effect, with most of the improvement occurring over the first testing sessions. In another study, McCaslin et al.
2 reported a small learning effect on a third testing session of the ASTEROID test, although they found good test-retest reliability between the first two sessions. However, it has to be noted that their first two sessions were performed on the same day, whereas the third session was taken 14 days later. We underline that in our study, we also observed some learning between T1 and T6 on the ASTEROID test. This result is difficult to interpret, as we observed this improvement in both of our groups, eRDS-repeat and VV-repeat. We therefore cannot exclude the possibility of test-retest learning on ASTEROID. Other computer-based stereotests reported no learning on their retest session.
9–11,61 Looking more in detail, those studies repeated their testing sessions on the same day, which might be a potential explanation for this difference, as sleep can act to consolidate learning.
62,63 However, Tittes and collaborators
12 observed learning in subjects with poor stereopsis, although they repeated their testing on the same day. Literature approaching this issue of multiple testing is very sparse, and studies exploring the reliability of new tests often take two measures on the same day. This does not reflect clinical situations, where patients are tested on different days or months, and underlines the importance for a better understanding of the reliability of the used tests under multiple testing situations.
VV showed no learning between sessions. However, single sessions had poor/poor-to-moderate test-retest reliability, with a test-retest correlation only marginally significant between T1 and T2, and a large test-retest LOA (between 0.62 and 1.08) compared to eRDS (between 0.40 and 0.78) and ASTEROID (0.46). This relatively high variability across sessions can be expected given that each VV test was much shorter than the eRDS test, with just 28 to 176 trials. Pooling 3 sessions together reduced this variability, producing a good test-retest correlation and an improved LOA (0.52). This result must be interpreted with caution because we pooled sessions over different days. That said, given that no learning occurred across the six sessions, the assumption of stability seems valid.