Translational Vision Science & Technology Cover Image for Volume 14, Issue 4
April 2025
Volume 14, Issue 4
Open Access
Letters to the Editor  |   April 2025
Comments on Xie et al.’s Study on Artificial Intelligence–Assisted Perfusion Density as Biomarker for Screening Diabetic Nephropathy
Author Affiliations & Notes
  • James Lin
    Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
  • Ting-Wan Kao
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA e-mail: [email protected]
  • Footnotes
     JL and TWK contributed equally to this response.
Translational Vision Science & Technology April 2025, Vol.14, 11. doi:https://doi.org/10.1167/tvst.14.4.11
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      James Lin, Ting-Wan Kao; Comments on Xie et al.’s Study on Artificial Intelligence–Assisted Perfusion Density as Biomarker for Screening Diabetic Nephropathy. Trans. Vis. Sci. Tech. 2025;14(4):11. https://doi.org/10.1167/tvst.14.4.11.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
We read with great interest the study by Xie et al., Artificial Intelligence–Assisted Perfusion Density as a Biomarker for Screening Diabetic Nephropathy.1 The authors present a compelling case for the use of random forest classification of perfusion density from ultra-widefield swept-source optical coherence tomography angiography (SS-OCTA) as a screening tool for diabetic nephropathy (DN). 
The study uses a random forest model, achieving 85.8% accuracy in the type 2 diabetes mellitus (T2DM) population and 82.5% in the diabetic retinopathy (DR) population. Although these results are promising, we noted that perfusion density (PD) is already strongly correlated with DN, as demonstrated in table 3. This raises the question of whether a complex artificial intelligence (AI)-based model is necessary when simpler models can perform just as well. 
To investigate this, we conducted a simulation-based analysis using the published summary statistics from Xie et al. We simulated a dataset preserving the reported means and standard deviations of PD across the control, DR without DN, and DR with DN groups. Kernel density estimation was used to generate synthetic data without assuming a specific distribution, providing a flexible approximation of the underlying data structure while accounting for potential deviations from normality.2 
Logistic regression, linear discriminant analysis, and a random forest model were applied to classify DN cases within both the T2DM and DR populations. The dataset was stratified to maintain proportional class representations and divided into training and testing sets for evaluation. Our findings suggest that logistic regression and linear discriminant analysis performed comparably to the random forest classifier, with accuracy of 81.6% and 85.0%, respectively, in the T2DM population, and 76.3% and 84.2% in the DR population (see the Table). These results closely align with the original study's random forest performance (85.8% accuracy in T2DM and 82.6% in DR). ANOVA tests of these performance metrics showed a P value of 0.9281 for area under the receiver operating characteristic curve (AUROC) and 0.9670 for accuracy, indicating no statistically significant differences in classification performance among the models. 
Table.
 
Classification Performance of Different Models in Predicting Diabetic Nephropathy within the Type 2 Diabetes Mellitus and Diabetic Retinopathy Populations
Table.
 
Classification Performance of Different Models in Predicting Diabetic Nephropathy within the Type 2 Diabetes Mellitus and Diabetic Retinopathy Populations
Machine learning models, such as random forests, are particularly useful in scenarios where interactions among multiple complex features need to be captured. However, in this study, the input data consist of a single biomarker—perfusion density—without incorporating additional vascular parameters or spatially complex features. When classification is based on a single, highly correlated variable, the advantage of machine learning over traditional statistical approaches is less apparent.3 This raises concerns about whether the added complexity of a random forest model is justified when simpler, interpretable models achieve similar classification performance.4 
Additionally, we note that the study includes both eyes from some patients, which, if not accounted for, may inflate model performance. If both training and test sets include correlated eyes, the inherent similarity in the data may lead to an inflated accuracy estimate.5 Whereas the study effectively demonstrates the relationship between PD and DN, the absence of external validation limits the generalizability of these findings. External datasets from independent populations would be valuable in confirming the robustness of PD as a DN biomarker. Furthermore, we believe that reporting accuracy alone is not an ideal metric for evaluating model performance. If a condition is rare, a model that simply predicts the absence of the condition will achieve a near-perfect accuracy, despite lacking clinical utility. Alternative metrics such as sensitivity, specificity, and AUROC should be considered to provide a more comprehensive assessment. 
AI has enormous potential in diagnostic applications, and this study provides an important contribution to the field. However, we encourage future research to consider the advantages of interpretable models, particularly in clinical applications where transparency is crucial. We acknowledge that our analysis is a simple simulation-based sensitivity analysis and might not fully capture the complexities of the original population. Our purpose is not to challenge the validity of the study's conclusions but rather to raise an important methodological point regarding model complexity and its necessity in this context. We thank the authors for their contributions and look forward to further advancements in AI applications in ophthalmology and systemic disease prediction. 
References
Xie X, Wang W, Wang H, et al. Artificial intelligence-assisted perfusion density as biomarker for screening diabetic nephropathy. Transl Vis Sci Technol. 2024; 13(10): 19. [CrossRef] [PubMed]
Tan B, Sim R, Chua J, et al. Approaches to quantify optical coherence tomography angiography metrics. Ann Transl Med. 2020; 8(18): 1205. [CrossRef] [PubMed]
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019; 110: 12–22. [CrossRef] [PubMed]
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019; 1(5): 206–215. [CrossRef] [PubMed]
Ferdinandy B, Gerencsér L, Corrieri L, et al. Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures. PLoS One. 2020; 15(7): e0236092. [CrossRef] [PubMed]
Table.
 
Classification Performance of Different Models in Predicting Diabetic Nephropathy within the Type 2 Diabetes Mellitus and Diabetic Retinopathy Populations
Table.
 
Classification Performance of Different Models in Predicting Diabetic Nephropathy within the Type 2 Diabetes Mellitus and Diabetic Retinopathy Populations
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×