Table shows the demographic, VF, and OCT characteristics of patients included in this study. A total of 1541 eyes were included in the study. The median follow-up period for included eyes was 4.74 years, ranging from 3 to 11 years. The training and testing set consisted of 924 and 617 eyes, respectively. The age, gender, race, glaucoma severity, baseline VF measurements, and baseline OCT measurements were similar between the training and testing sets. Eyes included in our study were predominantly from female and Caucasian patients. The majority of eyes had suspect or mild glaucoma. Only a small percentage (approximately 4%) of eyes were identified as worsening using MD slope and GPA-like definitions of progression.
Support vector classifiers were the best-performing machine learning models for trend- and event-based analysis.
Figure 1 shows the AUCs of the statistical and machine learning models for detecting glaucoma progression with ground truths defined by MD slope or GPA-like.
For MD slope progression, the logistic regression AUCs were 0.72 (95% confidence interval [CI], 0.60–0.84), 0.69 (95% CI, 0.56–0.81), and 0.73 (95% CI, 0.61–0.85) for the cp-RNFL, GC-IPL, and combined predictions, respectively. The support vector classifier AUCs were 0.78 (95% CI, 0.69–0.86), 0.75 (95% CI, 0.64–0.86), and 0.81 (95% CI, 0.73–0.89) for the cp-RNFL, GC-IPL, and combined predictions, respectively. The support vector classifier had similar AUCs for cp-RNFL (P = 0.48), GC-IPL (P = 0.45), and the combined predictions (P = 0.24) when compared with the logistic regression. Although combining cp-RNFL and GC-IPL predictions resulted in a slightly higher AUC than either alone, these differences were not statistically significant for the logistic regression (combined vs. cp-RNFL, P = 0.85; combined vs. GC-IPL, P = 0.27) and support vector classifier (combined vs. cp-RNFL, P = 0.21; combined vs. GC-IPL, P = 0.13).
For GPA-like slope progression, the logistic regression AUCs were 0.62 (95% CI, 0.50–0.75), 0.69 (95% CI, 0.59–0.79), and 0.66 (95% CI, 0.55–0.76) for the cp-RNFL, GC-IPL, and combined predictions, respectively. The support vector classifier AUCs were 0.63 (95% CI, 0.53–0.73), 0.68 (95% CI, 0.58–0.78 and 0.69 (95% CI, 0.61–0.77), respectively. Compared with the logistic regression, the support vector classifier had similar AUCs for cp-RNFL (P = 0.97), GC-IPL predictions (P = 0.90), and the combined predictions (P = 0.65). For both models, AUCs of cp-RNFL predictions were similar to GC-IPL predictions. Combining the cp-RNFL and GC-IPL predictions did not result in a higher AUC than cp-RNFL and GC-IPL alone.
Figure 2 shows the performance of the logistic regression and support vector classifiers stratified by disease severity. For MD slope progression, cp-RNFL predictions from the support vector classifier and logistic regression resulted in the highest AUCs for eyes with suspect glaucoma, and the AUCs decreased with increasing disease severity. In contrast, GC-IPL predictions from the support vector classifier yielded the highest AUCs for eyes with moderate or advanced glaucoma and the lowest AUCs for suspect or mild disease. However, these trends across disease severity were not statistically significant. The AUCs for the statistical and machine learning models were statistically similar regardless of disease stage and the respective prediction types.
For GPA-like progression,
Figure 2 demonstrates that cp-RNFL, GC-IPL, and the combined predictions resulted in the highest AUCs in eyes with suspect glaucoma. The AUCs generally decreased with increasing disease severity. For the logistic regression, the AUCs were significantly different between suspect and moderate or advanced disease for cp-RNFL (
P < 0.001), GC-IPL (
P = 0.04), and the combined predictions (
P = 0.01). These differences between suspect and moderate or advanced disease were also significant for the support vector classifier when using GC-IPL and the combined predictions, but not when using cp-RNFL predictions. Like the MD progressors, the AUCs between the logistic regression and support vector classifiers were statistically similar across each disease stage and respective prediction type.
Figure 3 illustrates a series of five VF tests in an eye classified as a progressor by the cp-RNFL, GC-IPL, and combined predictions while using MD slope as the ground truth. The corresponding MD and the paired cp-RNFL and GC-IPL measurements are also shown.
Figure 3 demonstrates that, as the VF worsens, corresponding structural thinning can be detected in the cp-RNFL and GC-IPL thickness measurements. Meanwhile,
Figure 4 illustrates a series of five VF tests in an eye classified as not progressing by the cp-RNFL, GC-IPL, and combined predictions while using MD slope as the ground truth. Similarly, the corresponding MD, cp-RNFL, and GC-IPL measurements demonstrate concordance between the functional and structural parameters.
Our sensitivity analysis involved including fellow eyes, if eligible, from patients, and the results are shown in
Figure 5. The training and testing set consisted of 1387 and 966 eyes, respectively. The best-performing statistical model was the mixed effects model.
Figure 5 presents trends similar to our original analysis, as shown in
Figure 1. The mixed effects model and support vector classifier performed similarly for cp-RNFL, GC-IPL, and the combined predictions when using MD slope for progression. However, the support vector classifier performed better than the mixed effects model for cp-RNFL (
P = 0.02), GC-IPL (
P = 0.15), and the combined predictions (
P = 0.01) when using GPA-like definitions of progression.