Monitoring functional change in glaucoma is challenging given the inherent variability in VF measurements.
3 This variability can reach levels that violate the assumptions about measurement error underlying classical least squares regression, thereby yielding inaccurate evaluation of longitudinal VF data.
19,25 By way of weighting each data point
27 and by using centroids (average of available measurements),
28 estimates from robust regression and the DSF model are respectively less influenced by measurement variability. This property was leveraged in the current study by utilizing each model to postprocess existing VF data. When the false-positive rates derived from patients with stable glaucoma were used to correct for the positive rate obtained in the Rotterdam dataset, the sensitivity of the DSF-predicted dataset was higher than that of both the observed and MRR-predicted dataset (
Fig. 3). The sensitivity of the DSF-predicted dataset was, however, similar to that obtained with the observed dataset when longer follow-up series were used as a surrogate reference standard for progression (
Fig. 4). These results suggest that the analysis of DSF postprocessed data could be useful in screening glaucoma patients to identify those likely to progress and estimate future progression status in individual patients. This could enhance our ability to detect glaucoma progression early and to better plan for clinical interventions ahead of time.
Although less than one-fifth of glaucoma patients under care would progress rapidly, undetected slow progression can be detrimental to a younger patient over time.
3 A screening tool that would help identify patients at risk of rapid progression without missing slow progression would be useful in glaucoma management. Despite having similar classification accuracy (based on pAUC) as the observed dataset (
Fig. 4), the ability of the DSF-predicted dataset to flag relatively more eyes as progressing (
Fig. 3) at a fixed specificity can be harnessed to develop a screening tool for glaucoma progression. Although the higher sensitivity of this approach suggests the possibility overcalling of progression, a false alert that a patient may be at greater risk of progression would only result in closer monitoring, which would not be as detrimental as misdiagnosing a non-glaucomatous eye for treatment. The greater sensitivity associated with using the DSF-predicted dataset could be an inflation of the false-positive rate inherent in the data prediction process. However, this was ruled out with the evaluation of the test–retest dataset in which the false-positive estimates for each predicted dataset equaled the proportion of eyes expected to be flagged as progressing due to chance (
Supplementary Fig. S1). With a median coefficient of determination (
R2) of 0.70 (IQR, 0.58) for the DSF-predicted dataset, 0.12 (IQR, 0.30) for the observed dataset, and 0.43 (IQR, 0.69) for MRR-predicted dataset, factors such as reduced variability within the DSF-predicted dataset potentially account for its greater sensitivity.
In the absence of a standardized reference for glaucoma progression,
32 in one approach we used progression outcomes based on observed data available at the 12th and 15th visits as a surrogate gold standard. Among the limitations of this surrogate reference is the possibility that the series of 12 or 15 VFs may be insufficient follow-up data to confirm or rule out the possible progression flagged by predicted datasets in some eyes. For example, in
Figure 2E progression was flagged by both the DSF-predicted (slope = –0.09 dB/y;
P < 0.01) and the MRR-predicted measurements (slope = –0.41 dB/y;
P = 0.01) but was not identified in the observed data available at the 15th visit (slope = –0.02 dB/y;
P = 0.33). These misclassifications between predicted and observed datasets may or may not hold when data from a longer period are available. Given the follow-up length in the study, we observed that both predicted datasets tended to flag more stable eyes as progressing compared to the observed dataset, as presented in
Supplementary Figure S2A. This ultimately can affect the sensitivity and pAUC estimated for the predicted datasets. For example, the MRR-predicted dataset was the least sensitive approach (
Fig. 4) despite having about 30% less within-series variability compared to the observed dataset. Given that data prediction did not compromise specificity (
Supplementary Fig. S1), patients flagged as progressing with the evaluation of predicted measurements may be at greater risk of progression and could benefit from close monitoring, especially when longer follow-up data are not available.
Clinicians currently rely on repeated testing and longer follow-up to obtain reliable data for accurate assessment of progression.
6 However, this increases the burden on patients and clinic resources and may further delay the detection of progression. In contrast, postprocessing existing VF data by means of predicting future measurements with statistical models offers a practical and inexpensive solution for reducing measurement variability without requiring additional testing. We observed that the DSF model predicted MD values more accurately than the MRR model, with MAE differences of 0.12 dB and 0.43 dB for predictions at the 12th and 15th visits, respectively. We further observed that the evaluation of DSF-predicted measurements yielded comparable rates of progression as obtained with longer observed VF series (
Fig. 5). These findings suggest that evaluation of postprocessed VF data derived with the DSF model could provide a comparatively reliable estimate of the rate of progression early in some patients.
In contrast to approaches that are based on either population estimates such as glaucoma progression analysis
32 or censoring unreliable sensitivities with predetermined cut-offs,
15,16 analysis of postprocessed data may permit individualized progression assessment, as these predicted measurements are generated from each patient's own existing data. A commonly used model for individualized progression assessment—permutation of pointwise linear regression (PoPLR)
36—involves estimation of the overall significance of deterioration across all VF locations in reference to 5000 permutations of a patient's own data. The PoPLR model has been shown to be more sensitive to functional progression than trend analysis of a single global index such as MD. We performed PoPLR analysis on series of raw VF sensitivities from V
1 to V
9 extracted from the Rotterdam dataset and compared the proportion of eyes it flagged as progressing to that obtained with the analysis of predicted and observed MD values (
Fig. 6). Consistent with the findings of O'Leary et al.,
36 the PoPLR model flagged a greater proportion of eyes as progressing compared to conventional trend analysis of observed MD values. Analyses of DSF-predicted and MRR-predicted global MD measurements had greater positive rates compared to the PoPLR model. The generation and analysis of pointwise postprocessed VF measurements represent an avenue of future work necessary to determine whether it could further improve sensitivity to functional progression.
As test–retest variability increases with disease severity, the magnitude of prediction error is expected to become larger. The centroid of the DSF model is an average estimation of the level of damage, and its relationship with prediction error could be useful for modeling the expected variability in measurements as the disease progresses. For each DSF prediction, we determined the associated prediction error as the absolute difference between the observed value and the predicted value. A simple linear regression of the absolute differences on the corresponding centroids revealed that 2% to 15% of the variability in predictor error was accounted for by the centroid. This suggests that the centroid may not be useful for modeling expected variability. Nonetheless, because the centroid is less sensitive to variability in observed measurements, it allows for the derivation of less variable predicted measurements, which could be clinically useful even in advanced disease.
In this study, the DSF model consistently included the first three data points (MD values from V1–V3) in making all predictions. This could potentially result in conservative predictions and shallower rates of progression in cases where extreme values occur earlier in the series. To examine the impact of this on our findings, we explored a “moving window” alternative, where predicted measurements were obtained by applying the DSF model to only the set of three preceding data points. Thus, we used MD values from V1 to V3 to predict V4, V2 to V4 to predict V5, V3 to V5 to predict V6, and so on. At a specificity of 95%, MD values predicted using either all available data or based on a “moving window” flagged 37 eyes, 34 of which were flagged by both methods, yielding a kappa agreement of 0.88 (substantial agreement). This finding suggests that including the first data points in all DSF predictions did not significantly alter the ability to identify change in this study. Predictions based on a “moving window” may be appropriate when extreme values occur early in the observed series, or in the clinical setting where the most recent values are most relevant.
The current study has limitations. First, the length of MD series used to assess progression at the ninth visit differed between the observed and predicted datasets. Our rationale for analyzing all nine observed MD values from V
1 to V
9 versus six predicted MD values for V
4 to V
9 was based on the fact that observed data for V
1–3 were used in generating all predicted measurements. Although longer series provide a more accurate estimate of progression,
37 the DSF-predicted dataset was more sensitive compared to the observed dataset at the ninth visit. Even fewer eyes were flagged as progressing when shorter series of observed data (V
1–6 or V
4–9) were analyzed. Another limitation is that the patients enrolled in the Rotterdam Eye Study were under standard clinical care, and those who showed signs of deterioration likely received modifications to their treatment to slow the rate of progression. Given that the observed and predicted datasets obtained within the same period were analyzed and compared, we do not believe that variability in the rate of progression over time introduced systematic bias in favor of either of the methods used in this study.
In conclusion, we have demonstrated in this study that assessing progression with postprocessed VF measurements generated with the DSF model resulted in similar or better sensitivity compared to observed data and that it yielded comparable rates of progression as longer observed VF series, without compromising specificity. These findings may be due to the reduced variability within the DSF-predicted series of measurements. In the absence of a widely accepted method to identify glaucoma progression, the evaluation of postprocessed measurements may be useful for identifying patients at greater risk of progression. These patients could be monitored more closely to determine whether changing or intensifying therapy is warranted to prevent or slow glaucomatous vision loss.