In our experiments, the MAE was reduced by 15% to 18% with S-ZEST in reliable simulated patients. This is remarkably close to the 20% benchmark estimated to be necessary to provide clinical benefit in the detection of pointwise deterioration.
30 This is important with regard to facilitating the tracking of localized progression in clinical practice, especially for locations near fixation. S-ZEST also seemed able to reduce the effect of FP or FN responses. This is evident by looking at the mean signed error of S-ZEST (
Table 2), which was closer to zero for both the high FP and high FN simulations compared to ZEST. It should be noted that a much smaller improvement was obtained for MS (
Tables 3,
4;
Fig. 4), and this is likely the case also for other global metrics, such as the MD, because correlated errors will determine global shifts in MS and dominate variability (see
Appendix).
26,27,31 Such a result is important to understanding what improvements should be expected in global metrics when new strategies are deployed in clinical practice. Naturally, our results depend on the specific choice of the variability model used to simulate the data. We adopted the exponential model for the SD of the Gaussian psychometric function proposed by Henson et al.
3 capped at 6 dB. This choice was mainly done for consistency with previous work but does not have a strong justification. For example, in previous work, we found that a cap at 8.17 dB would better describe variability in a test–retest dataset.
32 Gardiner et al.
33 recently proposed alternative models on other psychometric data. They showed, for example, that a segmented linear model for the SD might better describe variability for extremely low threshold values. To show how our results are affected by the choice of the model, we performed additional simulations with reliable observers using the coefficients provided by Gardiner et al.
33 Implementing a segmented linear model is not easily achieved in the current OPI simulation framework. However, the exponential model fitted on the same data and capped at 10 dB offers a very close approximation for thresholds above 0 dB,
33 the lower bound for our data, and was therefore chosen for these additional experiments. These results are largely similar to our main ones, although they further favor S-ZEST, showing a significant effect on the MS-AE, which was not statistically significant in our main set of simulations (see
Table 4). These additional results are reported as
Supplementary Material.