September 2024
Volume 13, Issue 9
Open Access
Artificial Intelligence  |   September 2024
Prediction of Post-Treatment Visual Acuity in Age-Related Macular Degeneration Patients With an Interpretable Machine Learning Method
Author Affiliations & Notes
  • Najung Kim
    Department of Ophthalmology, Konkuk University School of Medicine, Konkuk University Medical Center, Seoul, Republic of Korea
  • Minsub Lee
    Department of Ophthalmology, Konkuk University School of Medicine, Konkuk University Medical Center, Seoul, Republic of Korea
  • Hyewon Chung
    Department of Ophthalmology, Konkuk University School of Medicine, Konkuk University Medical Center, Seoul, Republic of Korea
  • Hyung Chan Kim
    Kong Eye Hospital, Seoul, Republic of Korea
  • Hyungwoo Lee
    Department of Ophthalmology, Konkuk University School of Medicine, Konkuk University Medical Center, Seoul, Republic of Korea
  • Correspondence: Hyungwoo Lee, Department of Ophthalmology, Konkuk University School of Medicine, Konkuk University Medical Center, Seoul 05030, Republic of Korea. e-mail: hwlee@kuh.ac.kr 
Translational Vision Science & Technology September 2024, Vol.13, 3. doi:https://doi.org/10.1167/tvst.13.9.3
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Najung Kim, Minsub Lee, Hyewon Chung, Hyung Chan Kim, Hyungwoo Lee; Prediction of Post-Treatment Visual Acuity in Age-Related Macular Degeneration Patients With an Interpretable Machine Learning Method. Trans. Vis. Sci. Tech. 2024;13(9):3. https://doi.org/10.1167/tvst.13.9.3.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: We evaluated the features predicting visual acuity (VA) after one year in neovascular age-related macular degeneration (nAMD) patients.

Methods: A total of 527 eyes of 506 patients were included. Machine learning (ML) models were trained to predict VA deterioration beyond a logarithm of the minimum angle of resolution of 1.0 after 1 year based on the sequential addition of multimodal data. BaseM models used clinical data (age, sex, treatment regimen, and VA), SegM models included fluid volumes from optical coherence tomography (OCT) images, and RawM models used probabilities of visual deterioration (hereafter probability) from deep learning classifiers trained on baseline OCT (OCT0) and OCT after three loading doses (OCT3), fluorescein angiography, and indocyanine green angiography. We applied SHapley Additive exPlanations (SHAP) for machine learning model interpretation.

Results: The RawM model based on the probability of OCT0 outperformed the SegM model (area under the receiver operating characteristic curve of 0.95 vs. 0.91). Adding probabilities from OCT3, fluorescein angiography, and indocyanine green angiography to RawM showed minimal performance improvement, highlighting the practicality of using raw OCT0 data for predicting visual outcomes. Applied SHapley Additive exPlanations analysis identified VA after 3 months and OCT3 probability values as the most influential features over quantified fluid segments.

Conclusions: Integrating multimodal data to create a visual predictive model yielded accurate, interpretable predictions. This approach allowed the identification of crucial factors for predicting VA in patients with nAMD.

Translational Relevance: Interpreting a predictive model for 1-year VA in patients with nAMD from multimodal data allowed us to identify crucial factors for predicting VA.

Introduction
Patients with neovascular age-related macular degeneration (nAMD) experience central vision loss owing to damage to macular photoreceptors caused by macular neovascularization, accompanied by fluid leakage and subretinal hemorrhage.1 The primary treatment is intravitreal injections of anti-vascular endothelial growth factor (anti-VEGF) agents, but patient responses vary, complicating visual outcome predictions.24 Traditional regression models struggle with nonlinear relationships between visual outcomes and clinical factors.5 Artificial intelligence has enhanced visual acuity (VA) predictions by modeling nonlinear relationships using clinical and imaging data. Schmidt–Erfurth et al.6 developed a machine learning (ML) model for predicting VA outcomes at 12 months based on the initial VA and measurements of the fluid compartments. The model demonstrated a predictive power of R2 = 0.70 and a root mean square error of 8.6 Early Treatment Diabetic Retinopathy Study letters.6 In 2018, Rohm et al.7 developed a similar model for predicting best-corrected VA at 3 and 12 months after the start of treatment with three anti-VEGF agent injections based on clinical and quantified OCT data. In 2020, Kawczynski et al.8 developed deep learning (DL) models using OCT images to predict VA, achieving an area under the receiver operating characteristic curve (AUC) of 0.84. 
Prior studies quantified imaging data features, such as fluid compartment and retinal layer thickness, because conventional ML models cannot directly process images. These measurements had a limited impact on model predictions.6,7,9 Metha et al.10 developed an ML model for glaucoma detection using multimodal data and raw OCT images, improving predictive power by adding probability values from DL models trained on each imaging modality. This work suggests that using original OCT images could enhance VA prediction accuracy for patients with nAMD. To accomplish this goal, we aimed to use the probability of poor vision calculated by a DL model, given the usefulness of this measure for image processing.11 
We used the eXtreme Gradient Boosting (XGBoost) model, a reliable and efficient algorithm in health care research.1218 Despite their accuracy, developers and physicians often struggle to comprehend the rationale behind these decisions because of the large, complex parameters inherent to the models. This lack of understanding could be problematic in clinical settings where clinicians rely on artificial intelligence results.19 Thus, researchers have explored the concept of explainable artificial intelligence, in which explanations of a model's actions are derived to make it more comprehensible to human users. SHapley Additive exPlanations (SHAP), which are based on the calculation of the Shapley value from game theory, are increasingly being used to interpret medical ML models.2023 The Shapley values represent the average change in predictive power according to the presence or absence of a single feature over all possible combinations of features, estimating the average contribution of each feature to the model's decisions.20 This approach offers a reliable method for assessing which features are important for model decision-making. Thus, in this study, we used SHAP to interpret the significant features of the ML models trained. 
This study compares ML models' predictive power for poor VA classification after 1 year in treatment-naïve patients with nAMD. We added multimodal data sequentially, including clinical data (age, sex, treatment regimen, VA at baseline and after three loading doses of anti-VEGF injections [VA3]), and fluid compartment measurements from OCT images at baseline and after three loading doses. We trained separate DL classifiers for different imaging modalities, including OCT, fluorescein angiography (FA), and indocyanine green angiography (ICGA) imaging and used these probabilities as additional ML model inputs. We evaluated feature importance using SHAP. 
Methods
Cohort Selection and Dataset
This study analyzed data from treatment-naïve patients with nAMD treated with anti-VEGF agents, including aflibercept, ranibizumab, brolucizumab, and bevacizumab, at Konkuk University Medical Center between January 2005 and July 2023. The study received approval from the Institutional Review Board of Konkuk University Medical Center (approval number: 2024-01-028) in accordance with the Declaration of Helsinki. Patients were subjected to either a treatment and extension regimen or a pro re nata (PRN) treatment regimen. The treatment involved an initial loading phase involving the administration of three anti-VEGF intravitreal injections at 4-week intervals. We collected data on age, sex, treatment regimen, baseline VA, and VA after three loading doses of anti-VEGF injections from medical records. Imaging data included baseline OCT volume scans, FA and ICGA images at 5 minutes, and OCT volume scans after the three loading doses for each eye. The study cohort comprised 527 eyes of 506 patients, excluding those who missed the initial loading phase; lacked VA data or OCT, FA, or ICGA images at baseline; had less than 10 months of follow-up; or had insufficient image quality owing to the presence of shadow artifacts, motion artifacts, or strong noise. OCT images with poor image quality, as defined by a signal-to-noise ratio of less than 25 dB and 40 dB, were excluded from the analysis. 
BaseM1, BaseM2, and BaseM3: Models Trained on Clinical Data
An XGBoost model was built to predict patients with nAMD' VA prognosis after 1 year of anti-VEGF treatment. Patients’ VA at 12 months was classified as poor (logarithm of the minimum angle of resolution [logMAR] VA ≥ 1.0) or good (logMAR VA < 1.0). This logMAR VA 1.0 threshold corresponds with the standard of severe visual impairment as defined by the Early Treatment Diabetic Retinopathy Study.28 The 527 eyes of 506 patients were divided randomly into three sets: 426 eyes from 405 patients in the training set, 48 eyes from 48 patients in the validation set, and 53 eyes from 53 patients in the test set. The model's performance was monitored on a separate test dataset to avoid overfitting. The training procedure was stopped once the performance on the test dataset did not improve after a fixed number of training iterations. 
We built three baseline ML models (BaseM1, BaseM2, and BaseM3) based on different groups of demographic characteristics: BaseM1 included sex, age, and baseline VA; BaseM2 added VA after three loading doses of anti-VEGF injections; and BaseM3 included all previous features plus treatment regimen (Table 1). 
Table 1.
 
Input Data for XGBoost Models
Table 1.
 
Input Data for XGBoost Models
SegM1 and SegM2: Models Combining Clinical Data With Fluid Compartment Quantification
Previous studies have shown that fluid compartments are associated with VA prognosis in AMD patients.2427 In line with previous literature, we segmented the four different retinal compartments—intraretinal fluid (IRF), subretinal fluid, subretinal hyperreflective material, and pigment epithelial detachment—at 250-µm intervals using our established U-net–based DL segmentation algorithm. We used 25 OCT scans per eye at baseline and after three loading doses of anti-VEGF injections, converting the data into volumes. After applying the DL segmentation algorithm, clinicians (H.L. and N.K.) manually rechecked the retinal compartments to ensure the accuracy of the input dataset. Supplementary Figure S1 displays examples of OCT scans segmented by the DL algorithm. 
These volumes were combined with BaseM3 features for SegM1 and SegM2 models. SegM1 included baseline compartment volumes, and SegM2 added volumes after three loading doses of anti-VEGF injections (Table 1). 
RawM1, RawM2, and RawM3: Models Combining Clinical Data With Prognostic Probability From Imaging
Prognostic value was extracted directly from each imaging modality to avoid losing potential information about visual prognosis that may result from arbitrarily selecting a fluid compartment. An XGBoost cannot process raw image data as input; therefore, we needed to convert the image data into numerical data. We adopted an approach from prior research, which trained a DL model to predict the probability of glaucoma progression based on fundus photography data.10 We constructed DL models using convolutional neural networks to predict whether VA would be greater than logMAR 1.0 after 1 year of treatment, categorizing the outcomes as either good or poor. The poor group consisted of patients whose VA was greater than logMAR 1.0 after 1 year of treatment, and the good group consisted of patients whose VA was less than logMAR 1.0 after 1 year of treatment. These separate DL models were trained using the training set of imaging data from OCT0, OCT3, FA, and ICGA. The imaging data of the test set were then input into the trained DL models to predict the probability of a poor visual outcome as a value between 0 and 1. This probability was then used as one of the input features for the XGBoost model. TensorFlow was used with an initial learning rate of 10−5, and the model was trained for 10 epochs on a single graphic processing unit. To improve the model's generalization ability, we augmented the data by applying horizontal flipping, random brightness contrast adjustment, and random size cropping to the input images. The AUC of each model exceeded 0.7. 
These probability values from these convolutional neural network models were integrated into BaseM3 features for RawM1, RawM2, and RawM3 models. RawM1 included baseline OCT probability; RawM2 added probability value from OCT images obtained after administration of three loading doses of anti-VEGF injections; RawM3 included FA and ICGA probabilities (Table 1). 
Furthermore, the gradient-weighted class activation mapping (Grad-CAM) technique was used for convolutional neural network model transparency, highlighting important image regions for predictions.28 
Model Interpretation Using SHAP
SHAP was used to interpret the XGBoost model, providing insight into feature importance and interactions. The TreeExplainer method from the SHAP package calculated these values. All programming was conducted using Python 3.7.9. To assess the importance of all features in the binary classification, we merged all clinical data and image-based results to produce FinalM and applied SHAP. Figure 1 presents a diagram of all the ML models. 
Figure 1.
 
Flowchart of XGBoost model development. BaseM1 included sex, age, and baseline VA. BaseM2 included sex, age, baseline VA, and VA after three loading doses of anti-VEGF injections. BaseM3 included sex, age, baseline VA, VA after three loading doses of anti-VEGF injections, and treatment regimen. SegM1 was trained on a combination of the feature set of BaseM3 and the segmented retinal compartment volumes from the baseline OCT images (Seg0). SegM2 was trained on a combination of the feature set of SegM1 and the segmented retinal compartment volumes from the OCT images obtained after administration of three loading doses of anti-VEGF injections (Seg3). RawM1 was trained using the probability of a poor prognosis (logMAR VA of 1.0 or above, hereafter referred to as “probability”) derived from the DL model based on baseline OCT images (OCT0 prob) as well as all features from BaseM3. RawM2 was trained on a combination of the feature set of RawM1 and the probability value derived from the DL model based on the OCT images obtained after three loading treatments (OCT3 prob). RawM3 was trained on a combination of the feature set of RawM2 and the probability value derived from the DL model based on the baseline FA images (FA prob) and the baseline ICGA images (ICGA prob). Finally, we merged all clinical data and image-based results to produce the FinalM. OCT, optical coherence tomography.
Figure 1.
 
Flowchart of XGBoost model development. BaseM1 included sex, age, and baseline VA. BaseM2 included sex, age, baseline VA, and VA after three loading doses of anti-VEGF injections. BaseM3 included sex, age, baseline VA, VA after three loading doses of anti-VEGF injections, and treatment regimen. SegM1 was trained on a combination of the feature set of BaseM3 and the segmented retinal compartment volumes from the baseline OCT images (Seg0). SegM2 was trained on a combination of the feature set of SegM1 and the segmented retinal compartment volumes from the OCT images obtained after administration of three loading doses of anti-VEGF injections (Seg3). RawM1 was trained using the probability of a poor prognosis (logMAR VA of 1.0 or above, hereafter referred to as “probability”) derived from the DL model based on baseline OCT images (OCT0 prob) as well as all features from BaseM3. RawM2 was trained on a combination of the feature set of RawM1 and the probability value derived from the DL model based on the OCT images obtained after three loading treatments (OCT3 prob). RawM3 was trained on a combination of the feature set of RawM2 and the probability value derived from the DL model based on the baseline FA images (FA prob) and the baseline ICGA images (ICGA prob). Finally, we merged all clinical data and image-based results to produce the FinalM. OCT, optical coherence tomography.
Statistical Analysis
Model performance was evaluated based on the AUC, a widely used performance metric for evaluating binary classifiers,29 in which higher values indicate better predictive performance.30 Differences between outcome groups were analyzed using the Mann–Whitney test for continuous variables, including age, VA, segmented retinal compartment volumes, and probability values. Fisher's exact test was used to compare categorical variables such as sex and treatment regimen. 
Results
Study Cohort
The study cohort included 527 eyes from 506 patients (Table 2). The mean patient age was 73.74 ± 8.83 years, and the baseline VA was 0.57 ± 0.55 logMAR, with 60.34% being male. These findings are consistent with the epidemiology of AMD.31 Patients received three monthly loading doses of anti-VEGF treatment (aflibercept, ranibizumab, brolucizumab, or bevacizumab) based on the clinical judgement of H.L., H.C., and H.C.K. No statistically significant differences were found in outcomes for the different anti-VEGF drugs using logistic regression at a 95% confidence interval. Additional statistical analyses for retinal compartments revealed significant associations between the volume of IRF at baseline, the volume of subretinal hyperreflective material after three loading doses, and VA at 1 year (Supplementary Table S2). 
Table 2.
 
Comparison of the Clinical and Demographic Characteristics of the Patients Included
Table 2.
 
Comparison of the Clinical and Demographic Characteristics of the Patients Included
Performance of the BaseM1, BaseM2, and BaseM3 Models
We built several ML models to predict whether patients’ VA would be above or below logMAR 1.0 after 1 year of treatment. BaseM1, using age, sex, and baseline VA, achieved an AUC of 0.86. BaseM2, which added VA after three loading doses, improved accuracy to an AUC of 0.92. BaseM3, which included the treatment regimen, further improved performance with an AUC of 0.93 (Fig. 2). 
Figure 2.
 
Results of the VA prediction models. The area under the receiver operating characteristic (ROC) curve (AUC) for each model is shown. The models were trained by sequentially adding input according to the clinical sequence.
Figure 2.
 
Results of the VA prediction models. The area under the receiver operating characteristic (ROC) curve (AUC) for each model is shown. The models were trained by sequentially adding input according to the clinical sequence.
Performance of the SegM1 and SegM2 Models
Ensemble models were developed to compare segmented retinal compartment volumes and raw OCT images' effect on VA 1 year after treatment. SegM1 combined BaseM3 features and baseline OCT compartment volumes, achieving an AUC of 0.91. Adding retinal compartment volumes from OCT images obtained after the administration of three loading doses in SegM2 did not change performance (AUC = 0.91) (Fig. 2). 
Reliability of Probability Values Through Heatmaps
Separate DL models calculated the probability of poor VA for four sets of raw images. Grad-CAM provided insights into DL models' prioritization when predicting VA outcomes. Macular OCT heatmaps typically highlighted exudative changes relevant to nAMD diagnosis.1 The FA model's heatmaps showed areas of leakage with a 91% increase in hyperfluorescence in active macular neovascularization.32 The ICGA model's heatmaps focused on macular neovascularization areas with an 85% increase in fluorescent signal. Figure 3 shows Grad-CAM heatmap examples. Intergrader reliability for focusing on expected areas in OCT, FA, and ICGA images was 0.93, indicating the DL model's effective performance and reliable probability values, notably in lesion areas. 
Figure 3.
 
Saliency maps for macular OCT, FA, and ICGA. Each row displays a raw image followed by the same image overlaid with its corresponding saliency map. The saliency map for the baseline OCT is shown in (A), and (B) displays the saliency map for the OCT obtained after administration of three loading doses of anti-VEGF injections. The baseline FA and ICGA saliency maps are shown in (C) and (D), respectively. The red-marked area on the OCT saliency map indicates the areas the model deemed most important. OCT, optical coherence tomography.
Figure 3.
 
Saliency maps for macular OCT, FA, and ICGA. Each row displays a raw image followed by the same image overlaid with its corresponding saliency map. The saliency map for the baseline OCT is shown in (A), and (B) displays the saliency map for the OCT obtained after administration of three loading doses of anti-VEGF injections. The baseline FA and ICGA saliency maps are shown in (C) and (D), respectively. The red-marked area on the OCT saliency map indicates the areas the model deemed most important. OCT, optical coherence tomography.
Performance of the RawM1, RawM2, and RawM3 Models
RawM1, using baseline OCT-derived probabilities and BaseM3 features, outperformed previous models with an AUC of 0.95. RawM2, which added the probabilities derived from the OCT images obtained after the three loading treatments, maintained the same AUC (0.95). RawM3, including FA and ICGA-derived probabilities, achieved the highest performance with an AUC of 0.96 and supplementary metrics: accuracy of 0.93, specificity of 0.98, sensitivity of 0.78, and F1 score of 0.84 (Supplementary Table S1). FinalM, combining all clinical data, fluid volumes, and image-based probabilities, also achieved an AUC of 0.96 (Fig. 2). 
Model Interpretation Using SHAP
SHAP analysis on FinalM identified the top five features influencing 1-year VA outcomes: VA after three loading doses and probability values from DL models (OCT after loading doses, FA, baseline OCT, and ICGA). Figures 4 and 5 display the features contributing to the predicted value in order of importance as assessed by SHAP. The SHAP summary plot (Fig. 5) allows a more detailed analysis of the features compared with the SHAP bar plots (Fig. 4). In Figure 5, patients on the PRN regimen (coded as 0, blue dots) are concentrated on the positive side of the SHAP value axis, indicating that the PRN regimen positively affects the 1-year logMAR value, implying worse visual outcomes. They show a narrow distribution, indicating that the PRN regimen has a consistent effect on 1-year VA. However, patients on the treat and extend regimen (coded as 1, red dots) are distributed over a relatively wide range on the negative side of the SHAP value axis, indicating a better visual outcome. 
Figure 4.
 
SHAP bar plots. The plots display the absolute Shapley values for each feature across all the data, indicating their importance. The features are listed in order of importance, with those having higher mean absolute SHAP values being more influential. The absolute SHAP value is accounted for when ranking the features, regardless of whether the feature affects the prediction positively or negatively. SHAP, SHapley Additive exPlanations.
Figure 4.
 
SHAP bar plots. The plots display the absolute Shapley values for each feature across all the data, indicating their importance. The features are listed in order of importance, with those having higher mean absolute SHAP values being more influential. The absolute SHAP value is accounted for when ranking the features, regardless of whether the feature affects the prediction positively or negatively. SHAP, SHapley Additive exPlanations.
Figure 5.
 
Summary plot displaying feature importance. Each dot represents the SHAP value for an individual eye. A higher absolute SHAP value on the x axis indicates a greater impact on the prediction. The left y axis displays the input variables, ranked from top to bottom according to their mean absolute SHAP values for the entire dataset. The points are distributed horizontally along the x-axis according to their SHAP value. The color of each dot indicates the actual value of the features for each eye, ranging from low (blue) to high (red). The distribution of SHAP values per feature can be inferred from the y axis shift for overlapping points. SHAP, SHapley Additive exPlanations.
Figure 5.
 
Summary plot displaying feature importance. Each dot represents the SHAP value for an individual eye. A higher absolute SHAP value on the x axis indicates a greater impact on the prediction. The left y axis displays the input variables, ranked from top to bottom according to their mean absolute SHAP values for the entire dataset. The points are distributed horizontally along the x-axis according to their SHAP value. The color of each dot indicates the actual value of the features for each eye, ranging from low (blue) to high (red). The distribution of SHAP values per feature can be inferred from the y axis shift for overlapping points. SHAP, SHapley Additive exPlanations.
Dependence plots (Fig. 6) indicated a complex relationship between VA after three loading doses (3mlogMVA) and probabilities derived from the OCT images obtained after three loading doses (OCT3 prob). OCT3 prob was selected for Figure 6 because it exhibits the strongest interaction effects with 3mlogMVA. The vertical dispersion in SHAP values observed for fixed, variable values is due to interaction effects with other features. This finding indicates that the SHAP value of a specific eye for a given feature is not dependent solely on the value of that feature, but is also influenced by the values of the eye's other features. For instance, at a 3mlogMVA value of 2.0, the SHAP values range from approximately 1.3 to 2.3. Examining the color distribution, it is evident that higher levels of OCT3prob are associated with the decrease in SHAP values for this region. 
Figure 6.
 
SHAP dependence contribution plot. The plot displays the interaction between logMAR VA after three loading doses of anti-VEGF injections (3mlogMVA) and the probability derived from the DL model based on the OCT images obtained after administering three loading doses of anti-VEGF injections (OCT3prob). The individual eyes are represented by the dots on the graph. The points are colored by their OCT3 prob value, as shown by the color bar at the right y axis. The vertical dispersion according to the left y axis, the SHAP values of 3mlogMVA for each eye, is observed for fixed 3mlogMVA values. This is owing to the interaction effects with OCT3prob. This implies that the SHAP value of 3mlogMVA for a specific eye is not solely dependent on the value of 3mlogMVA, but is also influenced by the values of the eye's OCT3prob value. SHAP, SHapley Additive exPlanations.
Figure 6.
 
SHAP dependence contribution plot. The plot displays the interaction between logMAR VA after three loading doses of anti-VEGF injections (3mlogMVA) and the probability derived from the DL model based on the OCT images obtained after administering three loading doses of anti-VEGF injections (OCT3prob). The individual eyes are represented by the dots on the graph. The points are colored by their OCT3 prob value, as shown by the color bar at the right y axis. The vertical dispersion according to the left y axis, the SHAP values of 3mlogMVA for each eye, is observed for fixed 3mlogMVA values. This is owing to the interaction effects with OCT3prob. This implies that the SHAP value of 3mlogMVA for a specific eye is not solely dependent on the value of 3mlogMVA, but is also influenced by the values of the eye's OCT3prob value. SHAP, SHapley Additive exPlanations.
To identify predictors of better visual outcomes, the VA criterion was redefined to 20/40. FinalM, trained with the 20/40 threshold and analyzed with SHAP, showed that the probability value of baseline OCT image and VA after three loading doses of anti-VEGF injections were significantly more influential than other factors. The remaining factors had comparatively minor contributions, with minimal differences in their respective contributions (Supplementary Figs. S2 and S3). However, the performance of FinalM trained with the 20/40 visual threshold had an AUC of 0.87. The model's predictive performance was markedly inferior compared with the results obtained with the VA set at 20/200 (AUC of 0.96). 
Discussion
Automating long-term VA prognosis prediction in patients with nAMD using imaging and clinical data can enhance individualized treatment significantly. This study developed an interpretable ML model that combined clinical information with multiple sets of retinal images to predict VA after 1 year of treatment. We sequentially compared several models, noting a marked improvement when VA after three loading doses were included (AUC of 0.86 for BaseM1 vs. 0.92 for BaseM2). This result suggested that the VA after three loading doses had a significant impact on the VA after 1 year. However, the predictive accuracy decreased after adding the segmented retinal compartment volumes (AUC of 0.93 for BaseM3 vs. 0.91 for SegM1). The reason for this negative effect is unclear. The segmented retinal compartments may have generated data noise or caused a decrease in learning ability owing to the increased data. 
Model RawM1, using clinical and demographic data plus the baseline OCT, seems the most practical, providing an AUC of 0.95. Adding other imaging modalities (RawM2: probability from OCT after three doses, RawM3: probability from baseline FA and ICGA) results in little AUC increase. However, this finding does not imply that the additional input data in RawM2 and RawM3 are inconsequential. Comparing models based on different input data has limitations in determining each feature's exact contribution, because it does not consider interactions between features, which are added arbitrarily in an artificial order. Furthermore, SHAP's implementation reveals that 3mlogMVA and OCT3prob are the most relevant factors (Fig. 6), likely affecting the subtle changes in AUC when OCT3prob was added to Raw M1, despite its significant contribution. 
Interpretation of the FinalM model by SHAP revealed that the most important top two features were VA after three loading doses of anti-VEGF injections and the probabilities of poor VA from DL models based on OCT images after three loading doses of anti-VEGF injections (Fig. 4). This result aligns with previous studies emphasizing the significance of VA and OCT imaging information after three loading doses of anti-VEGF.33 
The probabilities of poor VA had greater predictive power than the segmented retinal compartment volumes derived from macular OCT (Figs. 4 and 5). This finding highlights the importance of considering the complete OCT image for visual prognosis, because it provides more valuable clinical information than individual retinal compartments. Although Fu et al.9 found that retinal compartment measurements helped predict short-term VA after three loading doses in an ML model for patients with nAMD, adding retinal compartment volumes for long-term (12-month) prediction reduced predictive power (AUC = 0.87) compared with a model based solely on VA information. The limited contribution of quantified OCT data in previous studies may be due to focusing only on total retinal thickness or predefined retinal compartments, missing other potentially valuable retinal information.34 
Our study revealed a significant correlation between FA, ICGA images, and VA prognosis. The SHAP values of FA and ICGA were ranked highly (Figs. 4 and 5). Previous studies have struggled to quantify specific biomarkers from FA and ICGA imaging for visual prognosis.35,36 Using the probability of poor vision calculated by DL models, we confirmed that FA and ICGA imaging are strongly associated with long-term VA prognosis. The FA and ICGA images contain characteristics of nAMD subtypes that reflect visual prognosis.32 This finding may explain the correlation between baseline FA, ICGA imaging features, and 1-year post-treatment VA. 
Furthermore, the SHAP summary plot (Fig. 5) allowed an in-depth analysis of the treatment regimen. The blue dots (PRN regimen) are concentrated on the positive side of the SHAP value axis (x axis), while the red dots (treat and extend regimen) are distributed on the negative side of the SHAP value axis, indicating that the PRN regimen is associated with poorer visual outcomes compared with the treat and extend regimen. Additionally, the narrow spread of dots along the x axis for the PRN regimen suggests it has a more predictable and consistent effect on VA outcomes compared with the treat and extend regimen. 
When comparing SHAP values among various retinal compartment segmentations, IRF and subretinal hyperreflective material showed a significant association with VA after 1 year, aligning with the statistical results in Supplementary Table S2 and previous studies.37 However, the low contribution of IRF after three loading doses of anti-VEGF to 1-year VA should be interpreted cautiously. Only 33 of 527 eyes (6.26%) were analyzed, making it challenging to determine its accurate contribution. A more detailed analysis of retinal compartments would require a larger patient cohort for comparison. 
The SHAP values in this article show that baseline VA contributed relatively weakly to the final prediction. In contrast, Abbas et al.38 revealed that baseline VA was the most important feature for ML models predicting 1-year posttreatment VA, but their study only included baseline data, not data after the initial three loading doses of anti-VEGF injections. Our study included both baseline data and data after three loading doses. Importantly, these results are relative and do not imply that baseline VA is unimportant. It is more accurate to state that VA at 3 months has a greater effect on VA at 12 months than baseline VA, aligning with prior research showing that the last measured best-corrected VA during the initiation phase is the most important predictor for functional outcomes.6 
Evaluating features with a redefined VA cutoff of 20/40 at 1 year showed that baseline OCT image and VA after three loading doses of anti-VEGF injections were the most influential factors (Supplementary Figs. S2 and S3). The performance AUC was 0.87, notably lower than the AUC of 0.96, with the VA set at 20/200. Model performance is crucial for interpreting SHAP results; higher model performance typically yields more accurate SHAP values. In this study, the high AUC of 0.96 with a VA criterion of 20/200 after 1 year is considered to provide more reliable results. 
Our study has certain limitations. We excluded patients with no follow-up data at 1 year and those with inadequate image quality for segmentation, potentially introducing selection bias. Because the ML model relies on quantitative metrics from OCT image segmentation, patients with images unsuitable for segmentation were excluded. This exclusion is likely to have affected patients with poor visual outcomes disproportionately, potentially owing to difficulties in attending appointments or obtaining images of adequate quality. Additionally, our sample size of 527 is relatively small for training an ML model; better performance may depend on obtaining larger national datasets. 
This study combined information from multiple sources, including two different retinal imaging modalities: the probability of poor vision derived from raw images and the measurements of the volumes of the fluid compartments, as well as demographic data, to construct a model for predicting VA 1 year after treatment. This approach enabled accurate prognostic prediction and allowed the identification and interpretation of critical variables, providing clinical insight into the pathogenesis of nAMD. 
Acknowledgments
Supported by the Konkuk University Medical Center Research Grant 2023. The sponsor or funding organization played no role in the design or conduct of this research. None of the authors has any conflicts of interest to disclose. 
Authors’ Contributions: H Lee has full access to all the data in the study and takes full responsibility for the integrity of the data and the accuracy of the data analyses. Concept and design: NJ Kim, H Lee; data acquisition, analysis, or interpretation: all authors; manuscript drafting: NJ Kim, H Lee; critical manuscript revision: all authors; statistical analysis: NJ Kim. 
Disclosure: N. Kim, None; M. Lee, None; H. Chung, None; H.C. Kim, None; H. Lee, None 
References
Simader C, Ritter M, Bolz M, et al. Morphologic parameters relevant for visual outcome during anti-angiogenic therapy of neovascular age-related macular degeneration. Ophthalmology. 2014; 121(6): 1237–1245. [CrossRef] [PubMed]
Slakter JS, Yannuzzi LA, Guyer DR, Sorenson JA, Orlock DA. Indocyanine-green angiography. Curr Opin Ophthalmol. 1995; 6(3): 25–32. [CrossRef] [PubMed]
Klein ML, Ferris FL, 3rd, Armstrong J, et al. Retinal precursors and the development of geographic atrophy in age-related macular degeneration. Ophthalmology. 2008; 115(6): 1026–1031. [CrossRef] [PubMed]
Nawash B, Ong J, Driban M, et al. Prognostic optical coherence tomography biomarkers in neovascular age-related macular degeneration. J Clin Med. 2023; 12(9): 3049. [CrossRef] [PubMed]
Mulyukov Z, Weber S, Pigeolet E, Clemens A, Lehr T, Racine A. Neovascular Age-related macular degeneration: a visual acuity model of natural disease progression and ranibizumab treatment effect. CPT Pharmacomet Syst. 2018; 7(10): 660–669. [CrossRef]
Schmidt-Erfurth U, Bogunovic H, Sadeghipour A, et al. Machine learning to analyze the prognostic value of current imaging biomarkers in neovascular age-related macular degeneration. Ophthalmol Retina. 2018; 2(1): 24–30. [CrossRef] [PubMed]
Rohm M, Tresp V, Muller M, et al. Predicting visual acuity by using machine learning in patients treated for neovascular age-related macular degeneration. Ophthalmology. 2018; 125(7): 1028–1036. [CrossRef] [PubMed]
Kawczynski MG, Bengtsson T, Dai J, Hopkins JJ, Gao SS, Willis JR. Development of deep learning models to predict best-corrected visual acuity from optical coherence tomography. Transl Vis Sci Technol. 2020; 9(2): 51. [CrossRef] [PubMed]
Fu DJ, Faes L, Wagner SK, et al. Predicting incremental and future visual change in neovascular age-related macular degeneration using deep learning. Ophthalmol Retina. 2021; 5(11): 1074–1084. [CrossRef] [PubMed]
Metha P, Peterson CA, Wen JC, et al. Automated detection of glaucoma with interpretable machine learning using clinical data and multimodal retinal images. Am J Ophthalmol. 2021; 231: 154–169. [PubMed]
Ren XD, Guo HN, Li SH, Wang SL, Li JH. A novel image classification method with CNN-Xgboost model. Lect Notes Comput Sc. 2017; 10431: 378–390. [CrossRef]
Yang C, Chen M, Yuan Q. The application of Xgboost and SHAP to examining the factors in freight truck-related crashes: an exploratory analysis. Accid Anal Prev. 2021; 158: 106153. [CrossRef] [PubMed]
Park J, Kim J, Ryu D, Choi HY. Factors related to steroid treatment responsiveness in thyroid eye disease patients and application of SHAP for feature analysis with Xgboost. Front Endocrinol (Lausanne). 2023; 14: 1079628. [CrossRef] [PubMed]
Hu M, Zhang H, Wu B, Li G, Zhou L. Interpretable predictive model for shield attitude control performance based on Xgboost and SHAP. Sci Rep. 2022; 12(1): 18226. [CrossRef] [PubMed]
Shi Y, Zou Y, Liu J, et al. Ultrasound-based radiomics Xgboost model to assess the risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: individual application of SHAP. Front Oncol. 2022; 12: 897596. [CrossRef] [PubMed]
Ahn JM, Kim J, Kim K. Ensemble machine learning of gradient boosting (Xgboost, Lightgbm, Catboost) and attention-based CNN-LSTM for harmful algal blooms forecasting. Toxins (Basel). 2023; 15(10): 37888638. [CrossRef]
Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013; 7: 21. [CrossRef] [PubMed]
Konstantinov AV, Utkin LV. Interpretable machine learning with an ensemble of gradient boosting machines. Knowledge-Based Systems. 2021; 222: 106993. [CrossRef]
Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ. XAI-explainable artificial intelligence. Sci Robot. 2019; 4(37): eaay7120. [CrossRef] [PubMed]
Yi F, Yang H, Chen D, et al. Xgboost-SHAP-based interpretable diagnostic framework for Alzheimer's disease. BMC Med Inform Decis Mak. 2023; 23(1): 137. [CrossRef] [PubMed]
Cakiroglu C, Aydin Y, Bekdas G, Geem ZW. Interpretable predictive modelling of basalt fiber reinforced concrete splitting tensile strength using ensemble machine learning methods and SHAP approach. Materials (Basel). 2023; 16(13): 4578. [CrossRef] [PubMed]
Tarabanis C, Kalampokis E, Khalil M, Alviar CL, Chinitz LA, Jankelson L. Explainable SHAP-Xgboost models for in-hospital mortality after myocardial infarction. Cardiovasc Digit Health J. 2023; 4(4): 126–132. [CrossRef] [PubMed]
Debjit K, Islam MS, Rahman MA, et al. An improved machine-learning approach for Covid-19 prediction using Harris Hawks optimization and feature analysis using SHAP. Diagnostics (Basel). 2022; 12(5): 1023. [CrossRef] [PubMed]
Regillo CD, Busbee BG, Ho AC, Ding BY, Haskova Z. Baseline predictors of 12-month treatment response to ranibizumab in patients with wet age-related macular degeneration. Am J Ophthalmol. 2015; 160(5): 1014–1023. [CrossRef] [PubMed]
Waldstein SM, Simader C, Staurenghi G, et al. Morphology and visual acuity in aflibercept and ranibizumab therapy for neovascular age-related macular degeneration in the View trials. Ophthalmology. 2016; 123(7): 1521–1529. [CrossRef] [PubMed]
Waldstein SM, Wright J, Warburton J, Margaron P, Simader C, Schmidt-Erfurth U. Predictive value of retinal morphology for visual acuity outcomes of different ranibizumab treatment regimens for neovascular AMD. Ophthalmology. 2016; 123(1): 60–69. [CrossRef] [PubMed]
Okeagu CU, Agron E, Vitale S, et al. Principal cause of poor visual acuity after neovascular age-related macular degeneration: Age-Related Eye Disease study 2 report number 23. Ophthalmol Retina. 2021; 5(1): 23–31. [CrossRef] [PubMed]
Jiang H, Xu J, Shi R, et al. A Multi-label deep learning model with interpretable Grad-Cam for diabetic retinopathy classification. Annu Int Conf IEEE Eng Med Biol Soc. 2020; 2020: 1560–1563. [PubMed]
Desbiens NA. Area under the ROC curve for a binary diagnostic test. Med Decis Making. 2001; 21(5): 421–422. [PubMed]
Garrido CG, Madero Jarabo R. The area under the ROC curve. Med Clin (Barc). 1996; 106(9): 355–356. Area Bajo La Curva Roc. [PubMed]
Klein R, Peto T, Bird A, Vannewkirk MR. The epidemiology of age-related macular degeneration. Am J Ophthalmol. 2004; 137(3): 486–495. [CrossRef] [PubMed]
Gualino V, Tadayoni R, Cohen SY, et al. Optical coherence tomography, fluorescein angiography, and diagnosis of choroidal neovascularization in age-related macular degeneration. Retina. 2019; 39(9): 1664–1671. [CrossRef] [PubMed]
Boyer DS, Antoszyk AN, Awh CC, et al. Subgroup analysis of the Marina study of ranibizumab in neovascular age-related macular degeneration. Ophthalmology. 2007; 114(2): 246–252. [CrossRef] [PubMed]
Lai TT, Hsieh YT, Yang CM, Ho TC, Yang CH. Biomarkers of optical coherence tomography in evaluating the treatment outcomes of neovascular age-related macular degeneration: a real-world study. Sci Rep. 2019; 9(1): 529. [CrossRef] [PubMed]
Arrigo A, Aragona E, Bordato A, et al. Morphological and functional relationship between OCTA and FA/ICGA quantitative features in AMD-related macular neovascularization. Front Med-Lausanne. 2021; 8: 3476193.
Fossataro F, Cennamo G, Montorio D, Clemente L, Costagliola C. Dark halo, a new biomarker in macular neovascularization: comparison between OCT angiography and ICGA-a pilot prospective study. Graef Arch Clin Exp. 2022; 260(10): 3205–3211. [CrossRef]
Jaffe GJ, Martin DF, Toth CA, et al. Macular morphology and visual acuity in the comparison of age-related macular degeneration treatments trials. Ophthalmology. 2013; 120(9): 1860–1870. [CrossRef] [PubMed]
Abbas A, O'byrne C, Fu DJ, et al. Evaluating an automated machine learning model that predicts visual acuity outcomes in patients with neovascular age-related macular degeneration. Graefes Arch Clin Exp Ophthalmol. 2022; 260(8): 2461–2473. [PubMed]
Figure 1.
 
Flowchart of XGBoost model development. BaseM1 included sex, age, and baseline VA. BaseM2 included sex, age, baseline VA, and VA after three loading doses of anti-VEGF injections. BaseM3 included sex, age, baseline VA, VA after three loading doses of anti-VEGF injections, and treatment regimen. SegM1 was trained on a combination of the feature set of BaseM3 and the segmented retinal compartment volumes from the baseline OCT images (Seg0). SegM2 was trained on a combination of the feature set of SegM1 and the segmented retinal compartment volumes from the OCT images obtained after administration of three loading doses of anti-VEGF injections (Seg3). RawM1 was trained using the probability of a poor prognosis (logMAR VA of 1.0 or above, hereafter referred to as “probability”) derived from the DL model based on baseline OCT images (OCT0 prob) as well as all features from BaseM3. RawM2 was trained on a combination of the feature set of RawM1 and the probability value derived from the DL model based on the OCT images obtained after three loading treatments (OCT3 prob). RawM3 was trained on a combination of the feature set of RawM2 and the probability value derived from the DL model based on the baseline FA images (FA prob) and the baseline ICGA images (ICGA prob). Finally, we merged all clinical data and image-based results to produce the FinalM. OCT, optical coherence tomography.
Figure 1.
 
Flowchart of XGBoost model development. BaseM1 included sex, age, and baseline VA. BaseM2 included sex, age, baseline VA, and VA after three loading doses of anti-VEGF injections. BaseM3 included sex, age, baseline VA, VA after three loading doses of anti-VEGF injections, and treatment regimen. SegM1 was trained on a combination of the feature set of BaseM3 and the segmented retinal compartment volumes from the baseline OCT images (Seg0). SegM2 was trained on a combination of the feature set of SegM1 and the segmented retinal compartment volumes from the OCT images obtained after administration of three loading doses of anti-VEGF injections (Seg3). RawM1 was trained using the probability of a poor prognosis (logMAR VA of 1.0 or above, hereafter referred to as “probability”) derived from the DL model based on baseline OCT images (OCT0 prob) as well as all features from BaseM3. RawM2 was trained on a combination of the feature set of RawM1 and the probability value derived from the DL model based on the OCT images obtained after three loading treatments (OCT3 prob). RawM3 was trained on a combination of the feature set of RawM2 and the probability value derived from the DL model based on the baseline FA images (FA prob) and the baseline ICGA images (ICGA prob). Finally, we merged all clinical data and image-based results to produce the FinalM. OCT, optical coherence tomography.
Figure 2.
 
Results of the VA prediction models. The area under the receiver operating characteristic (ROC) curve (AUC) for each model is shown. The models were trained by sequentially adding input according to the clinical sequence.
Figure 2.
 
Results of the VA prediction models. The area under the receiver operating characteristic (ROC) curve (AUC) for each model is shown. The models were trained by sequentially adding input according to the clinical sequence.
Figure 3.
 
Saliency maps for macular OCT, FA, and ICGA. Each row displays a raw image followed by the same image overlaid with its corresponding saliency map. The saliency map for the baseline OCT is shown in (A), and (B) displays the saliency map for the OCT obtained after administration of three loading doses of anti-VEGF injections. The baseline FA and ICGA saliency maps are shown in (C) and (D), respectively. The red-marked area on the OCT saliency map indicates the areas the model deemed most important. OCT, optical coherence tomography.
Figure 3.
 
Saliency maps for macular OCT, FA, and ICGA. Each row displays a raw image followed by the same image overlaid with its corresponding saliency map. The saliency map for the baseline OCT is shown in (A), and (B) displays the saliency map for the OCT obtained after administration of three loading doses of anti-VEGF injections. The baseline FA and ICGA saliency maps are shown in (C) and (D), respectively. The red-marked area on the OCT saliency map indicates the areas the model deemed most important. OCT, optical coherence tomography.
Figure 4.
 
SHAP bar plots. The plots display the absolute Shapley values for each feature across all the data, indicating their importance. The features are listed in order of importance, with those having higher mean absolute SHAP values being more influential. The absolute SHAP value is accounted for when ranking the features, regardless of whether the feature affects the prediction positively or negatively. SHAP, SHapley Additive exPlanations.
Figure 4.
 
SHAP bar plots. The plots display the absolute Shapley values for each feature across all the data, indicating their importance. The features are listed in order of importance, with those having higher mean absolute SHAP values being more influential. The absolute SHAP value is accounted for when ranking the features, regardless of whether the feature affects the prediction positively or negatively. SHAP, SHapley Additive exPlanations.
Figure 5.
 
Summary plot displaying feature importance. Each dot represents the SHAP value for an individual eye. A higher absolute SHAP value on the x axis indicates a greater impact on the prediction. The left y axis displays the input variables, ranked from top to bottom according to their mean absolute SHAP values for the entire dataset. The points are distributed horizontally along the x-axis according to their SHAP value. The color of each dot indicates the actual value of the features for each eye, ranging from low (blue) to high (red). The distribution of SHAP values per feature can be inferred from the y axis shift for overlapping points. SHAP, SHapley Additive exPlanations.
Figure 5.
 
Summary plot displaying feature importance. Each dot represents the SHAP value for an individual eye. A higher absolute SHAP value on the x axis indicates a greater impact on the prediction. The left y axis displays the input variables, ranked from top to bottom according to their mean absolute SHAP values for the entire dataset. The points are distributed horizontally along the x-axis according to their SHAP value. The color of each dot indicates the actual value of the features for each eye, ranging from low (blue) to high (red). The distribution of SHAP values per feature can be inferred from the y axis shift for overlapping points. SHAP, SHapley Additive exPlanations.
Figure 6.
 
SHAP dependence contribution plot. The plot displays the interaction between logMAR VA after three loading doses of anti-VEGF injections (3mlogMVA) and the probability derived from the DL model based on the OCT images obtained after administering three loading doses of anti-VEGF injections (OCT3prob). The individual eyes are represented by the dots on the graph. The points are colored by their OCT3 prob value, as shown by the color bar at the right y axis. The vertical dispersion according to the left y axis, the SHAP values of 3mlogMVA for each eye, is observed for fixed 3mlogMVA values. This is owing to the interaction effects with OCT3prob. This implies that the SHAP value of 3mlogMVA for a specific eye is not solely dependent on the value of 3mlogMVA, but is also influenced by the values of the eye's OCT3prob value. SHAP, SHapley Additive exPlanations.
Figure 6.
 
SHAP dependence contribution plot. The plot displays the interaction between logMAR VA after three loading doses of anti-VEGF injections (3mlogMVA) and the probability derived from the DL model based on the OCT images obtained after administering three loading doses of anti-VEGF injections (OCT3prob). The individual eyes are represented by the dots on the graph. The points are colored by their OCT3 prob value, as shown by the color bar at the right y axis. The vertical dispersion according to the left y axis, the SHAP values of 3mlogMVA for each eye, is observed for fixed 3mlogMVA values. This is owing to the interaction effects with OCT3prob. This implies that the SHAP value of 3mlogMVA for a specific eye is not solely dependent on the value of 3mlogMVA, but is also influenced by the values of the eye's OCT3prob value. SHAP, SHapley Additive exPlanations.
Table 1.
 
Input Data for XGBoost Models
Table 1.
 
Input Data for XGBoost Models
Table 2.
 
Comparison of the Clinical and Demographic Characteristics of the Patients Included
Table 2.
 
Comparison of the Clinical and Demographic Characteristics of the Patients Included
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×