December 2024
Volume 13, Issue 12
Open Access
Artificial Intelligence  |   December 2024
A Deep Learning Network for Accurate Retinal Multidisease Diagnosis Using Multiview Fusion of En Face and B-Scan Images: A Multicenter Study
Author Affiliations & Notes
  • Chubin Ou
    Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
    Guangdong Eye Intelligent Medical Imaging Equipment Engineering Technology Research Center, Foshan, China
  • Xifei Wei
    Guangdong Eye Intelligent Medical Imaging Equipment Engineering Technology Research Center, Foshan, China
  • Lin An
    Guangdong Eye Intelligent Medical Imaging Equipment Engineering Technology Research Center, Foshan, China
    Hangzhou Dianzi University, Hangzhou, China
  • Jia Qin
    Guangdong Eye Intelligent Medical Imaging Equipment Engineering Technology Research Center, Foshan, China
  • Min Zhu
    Department of Ophthalmology, The First People's Hospital of Foshan, Foshan, China
  • Mei Jin
    Department of Ophthalmology, Guangdong Provincial Hospital of Integrated Chinese and Western Medicine, Foshan, China
  • Xiangbin Kong
    Department of Ophthalmology, The Second People's Hospital of Foshan, Foshan, China
  • Correspondence: Xiangbin Kong, Department of Ophthalmology, The Second People's Hospital of Foshan, No. 78 Weiguo Rd., Foshan 528012, China. e-mail: [email protected] 
  • Footnotes
     CO and XW contributed equally to this work.
Translational Vision Science & Technology December 2024, Vol.13, 31. doi:https://doi.org/10.1167/tvst.13.12.31
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chubin Ou, Xifei Wei, Lin An, Jia Qin, Min Zhu, Mei Jin, Xiangbin Kong; A Deep Learning Network for Accurate Retinal Multidisease Diagnosis Using Multiview Fusion of En Face and B-Scan Images: A Multicenter Study. Trans. Vis. Sci. Tech. 2024;13(12):31. https://doi.org/10.1167/tvst.13.12.31.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Accurate diagnosis of retinal disease based on optical coherence tomography (OCT) requires scrutiny of both B-scan and en face images. The aim of this study was to investigate the effectiveness of fusing en face and B-scan images for better diagnostic performance of deep learning models.

Methods: A multiview fusion network (MVFN) with a decision fusion module to integrate fast-axis and slow-axis B-scans and en face information was proposed and compared with five state-of-the-art methods: a model using B-scans, a model using en face imaging, a model using three-dimensional volume, and two other relevant methods. They were evaluated using the OCTA-500 public dataset and a private multicenter dataset with 2330 cases; cases from the first center were used for training and cases from the second center were used for external validation. Performance was assessed by averaged area under the curve (AUC), accuracy, sensitivity, specificity, and precision.

Results: In the private external test set, our MVFN achieved the highest AUC of 0.994, significantly outperforming the other models (P < 0.01). Similarly, for the OCTA-500 public dataset, our proposed method also outperformed the other methods with the highest AUC of 0.976, further demonstrating its effectiveness. Typical cases were demonstrated using activation heatmaps to illustrate the synergy of combining en face and B-scan images.

Conclusions: The fusion of en face and B-scan information is an effective strategy for improving the diagnostic accuracy of deep learning models.

Translational Relevance: Multiview fusion models combining B-scan and en face images demonstrate great potential in improving AI performance for retina disease diagnosis.

Introduction
Optical coherence tomography (OCT) is a non-invasive optical imaging modality that produces high-resolution cross-sectional images of the retina. Three-dimensional (3D) volumetric imaging is enabled through a raster-scanning process that sequentially captures a series of cross-sectional images. The generation of en face images is achieved by projecting the volumetric dataset along the depth axis. The capacity of en face OCT to resolve individual retinal layers constitutes a significant advantage, particularly in pathologies that exhibit focal involvement of specific subretinal layers, such as cystoid macular edema,1 geographic atrophy,2 polypoidal choroidal vasculopathy,3 and retinal pigment epitheliitis.4 This capability allows for a more accurate and detailed assessment of the affected areas, crucial for accurate diagnosis and treatment planning. Tsuboi et al.5 reported that the use of en face OCT images significantly improves the detection of small lesions of retinal neovascularization. Wolff et al.6 revealed that en face OCT is a valuable tool for detecting outer retinal tubulations in age-related macular degeneration (AMD). 
Deep Learning for Disease Classification on OCT
Deep learning (DL) has been widely applied to the detection and classification of common eye diseases such as AMD, diabetes-related macular edema, and glaucoma from OCT images. Most researchers have focused on using OCT B-scans as inputs to diagnose AMD, diabetic retinopathy (DR), and other diseases.7-12 Some researchers have also developed models using 3D volumetric scans as input.13-15 There are also studies that have utilized en face OCT or optical coherence tomography angiography (OCTA) images for characterization of nonperfused capillaries,16 segmentation of geographic atrophy area,17 and diagnosis of central serous chorioretinopathy (CSC).18 Some studies have explored the fusion of multiscale information to enhance prediction performance.19,20 Sun et al.21 proposed a general framework to fuse B-scan features for volume-based classification, and Vente et al.22 investigated the fusion of diagnostic decision-making based on B-scan levels and volume levels. 
Despite the importance of en face image in diagnosis, the fusion of B-scan images and en face images is relatively understudied. Moreover, most previous DL-based studies on OCT images have focused on only a few (e.g., two to four) types of diseases, which does not meet the needs of clinical diagnoses of patients with various conditions present. 
In the current study, we proposed a multiview fusion DL network that effectively integrates the information from B-scan and en face images for accurate diagnosis of multiple common retinal diseases, including DR, AMD, macular hole (MH), CSC, retinoschisis (RS), epiretinal membrane (EM), and retinal vein occlusion (RVO). To demonstrate the effectiveness of our method, we compared it with five different methods based on B-scan images, en face images, 3D volumes, and two relevant state-of-the-art methods. 
Methods
Study Design and Data Acquisition
In this study, data were collected from two centers. At center 1, 1833 scans were collected, including 561 eyes of normal retina and 1272 eyes with seven types of diseases. At center 2, 497 scans were collected, including eyes with normal retina and eyes with seven types of diseases. Each patient had one or both eyes scanned, resulting in a total of 2330 volumetric scans for use in this study. Of these, the 1833 scans from center 1 were used as training set and the 497 scans from center 2 were used as an external test set. Patient characteristics for the two centers are shown in Table 1A. Macular regions of each eye were scanned using a commercial 120-kHz spectral-domain OCT system (Velite 3000; Guangdong Weiren Medical Technology Company, Guangdong, China) with a central wavelength of 840 nm. The scan depth was 2.5 mm, covering a 6.0 × 6.0-mm2 region (1024 × 448 × 448 pixels) centered on the fovea. Layer segmentation was performed automatically by the vendor software, and en face projections of superficial capillary, deep capillary, avascular, choroid, and retina plexuses were obtained. According to the manufacturer's recommendations, OCT scans with a signal strength index of <6 were generally of low quality and were excluded. We also discarded cases where en face images were severely distorted due to motion artifacts. All B-scan and en face images were exported as JPEG or PNG files. 
Two trained retinal specialists (both with more than 10 years of experience) examined all of the scanned volumes independently. If a disagreement was found in the diagnoses of the two specialists, a third specialist (with 15 years of experience) was consulted and a final diagnosis was made. B-scan images from a volume containing a lesion corresponding to its diagnosis were also extracted. This study was conducted with ethical approval from two institutional boards. 
To further demonstrate the effectiveness of our method and to aid future comparison, we also evaluated our method based on the OCTA-500 public dataset.23 Specifically, we used the OCTA-6M subset, which contains 300 cases in total with seven different labels: normal, DR, AMD, CSC, RVO, choroidal neovascularization (CNV), and other. We split the dataset into training, validation, and test groups at a ratio of 6:1:3. The characteristics of the two datasets are shown in Table 1B. 
Multiview Fusion Model Development
To fully utilize the information in the B-scan and en face images, we proposed a deep learning multiview fusion network (MVFN), which is composed of three subneural networks and a decision fusion module, as shown in Figure 1. Three subnetworks are responsible for processing images from the fast-axis B-scan, slow-axis B-scan, and en face direction. In the two subnetworks for B-scans, images are processed sequentially, outputting a 1 × 8 probability vector for each image corresponding to the probability of eight classes: normal, DR, AMD, CSC, MH, RVO, EM, and RS. The two B-scan subnetworks processed all 448 B-scan images within a 3D scan, producing two decision vectors, each of size 448 × 8. The en face subnetwork takes the superficial, deep, and avascular slab images as a three-channel input and outputs a 1 × 8 probability vector. For each volume, after processing by the three subnetworks, we obtained one 1 × 8 en face decision vector and two 448 × 8 B-scan decision vectors. The three subnetworks can be based on any backbone commonly used in the field of computer vision, such as ResNet, ConvNext, or Swin Transformer. For demonstration purpose, we employed the ResNet-50 as the backbone for the three subnetworks to serve as a baseline. 
Figure 1.
 
Schematics of the proposed multiview fusion network (MVFN).
Figure 1.
 
Schematics of the proposed multiview fusion network (MVFN).
The decision fusion module is comprised of several steps. For the two B-scan decision vectors (448 × 8), we first performed a probability smoothing by applying a mean filter with kernel size of 3 to the vectors along the row (448) direction. The rationale behind the filtering is that a certain type of lesion would appear in consecutive slices, and the filtering can help to reduce interslice variance. After that, we performed pooling, which extracts the five largest probabilities among the 448 positions for the seven disease classes (DR, AMD, MH, CSC, RS, EM, and RVO) and the five smallest probabilities for the one normal class, forming a new B-scan decision vector of size 5 × 8 (7 disease classes + 1 normal class). We concatenated these two new decision vectors (fast-axis and slow-axis) with the en face decision vector (1 × 8) and obtained a (5 × 2 + 1) × 8 vector, which was finally input into a random forest model to obtain the final output vector of size 1 × 8, corresponding to the probability of different diseases. It should be noted that, in addition to random forest, other types of machine learning models can also be used. 
Multilabel Asymmetric Loss for Complicated Multidisease Conditions
Most previous work in the literature has formulated the multidisease diagnosis problem as a multiclass classification problem.1013 This assumes that an eye is either normal or has only one type of disease; however, it should be recognized that sometimes an eye can suffer from multiple diseases at the same time. Therefore, we formulated the multidisease diagnosis as a multilabel problem, which enables the model to handle conditions where multiple diseases coexist. Because the numbers of different types of diseases in our training dataset are severely imbalanced, to help the model pay attention to minor classes we proposed using an asymmetry-balanced focal loss, shown as below:  
\begin{eqnarray*} {L_k} = \left\{ {\begin{array}{@{}*{1}{c}@{}} {\begin{array}{@{}*{1}{c}@{}} {{L_ + } = {{\left( {1 - p} \right)}^{{\rm{\gamma }} + }}\log \left( p \right)}\\ {{L_ - } = {p_m}^{{\rm{\gamma }} - }\log \left( {1 - {p_m}} \right)} \end{array}}\\ {{p_m} = max\left( {p - m,0} \right)} \end{array}} \right. \end{eqnarray*}
 
\begin{eqnarray*} {L_{total}} = \mathop \sum \limits_{k = 1}^8 - {y_k}{L_{k + }} - \left( {1 - {y_k}} \right){L_{k - }} \end{eqnarray*}
 
In the above equation, p is the probability output by the model, and y is the ground-truth label; γ+ and γ– are the positive and negative focusing parameters, respectively, set to 1 and 4 in our framework; and pm is the shifted probability to help the model to discard very easy negative samples, with m set to 0.05. The final loss is the sum of losses for the eight types of labels. Details of training are explained in Supplementary Material
Performance Comparison With Other Methods
We compared our proposed framework with three different types of methods, of which one is based on B-scan input only, one is based on en face input only, and one is based on 3D volume input. For the B-scan–based method, we reimplemented the algorithm proposed by Kermany et al.10 For the en face–based method, because we could not find any algorithm dedicated to OCT en face classification, we modified the algorithm proposed by Heinke et al.,24 which was originally designed for OCTA en face images, to accept the superficial, deep, and avascular projections as input. For the 3D volume–based method, we reimplemented the algorithm proposed by Ran et al.11 We further compared two state-of-the-art fusion methods. One is based on feature map attention fusion proposed by Sun et al.,21 denoted as finetuned ResNet-50 with data augmentation (FTA)–convolutional block attention module (CBAM). The other one is based on a recent method of uncertainty aware multi-instance learning proposed by de Vente et al.,25 denoted as uncertainty-aware (UA)–multiple-instance learning (MIL). The source code of our method and the two re-implemented baseline methods is provided at https://github.com/weixifei6688/Multi-Disease-Diagnosis-using-Multi-view-Fusion-of-enface-and-B-scan-Images. To evaluate the robustness of our proposed framework, we also performed a comparison employing different backbone models and different fusion algorithms within our proposed framework. 
Evaluation and Statistical Analysis
The primary evaluation metrics for diagnostic performance were the micro-averaged area under the receiver operating characteristic curve. Additionally, overall accuracy, sensitivity, specificity, and precision were calculated. Data from the first center were split in an 8:2 ratio and used as training and tuning sets. All networks were trained using a training set, and hyperparameters were tuned according to performance in the tuning set. To test the generalization performance of the model, data from the second center served as an external test set to guarantee unbiased performance. Similarly, for the OCTA-500 dataset, we split the data into training, tuning, and test sets at a ratio of 6:1:3. DeLong's test was used to determine the statistical significance of the difference in area under the receiver operating characteristic (ROC) curve (AUC) for the various methods. Multiple comparisons were corrected using the Bonferroni method. 
Results
Comparison of Different Methods
Table 2A shows the averaged AUC, accuracy, sensitivity, specificity, and precision of our fusion methods and other methods for the second center's external test set. ROC curves of different classes are plotted in Figure 2A. Our fusion method achieved the highest averaged AUC of 0.994, significantly better (P < 0.05) than other methods. The B-scan model achieved AUC performance similar to that of the en face model, and the 3D model scored the lowest performance among the four models. For other metrics, we observed trends similar to those for AUC. The fusion model obtained the highest accuracy (0.926), highest sensitivity (0.946), and highest precision (0.869) compared to other methods while maintaining specificity at an acceptable level of 0.930. Table 2B shows the averaged AUC, accuracy, sensitivity, specificity, and precision of our fusion methods and other methods for the OCTA-500 test set. We observed that our proposed method again achieved the highest AUC (0.969), significantly better than other methods. 
Figure 2.
 
ROC curves of the six models for different disease classes.
Figure 2.
 
ROC curves of the six models for different disease classes.
Robustness of Proposed Framework Assessed by the Use of Different Backbones and Machine Learning Methods
To evaluate the robustness of our fusion method, we also performed a comparison of the different backbones within our proposed framework. Four representative backbones were tested: classical convolutional neural network (CNN) architecture (ResNet-50), state-of-the-art transformer architecture (Swin Transformer and Vision Transformer), and state-of-the-art CNN architecture (ConvNext). We further compared different machine learning fusion algorithms, including logistic regression, support vector machine, and random forest, in the decision fusion module to investigate their influence. We can see from Tables 3 and 4 that, despite the choice of different model backbones or machine learning algorithms, the fusion model still performed consistently well, significantly outperforming the other models in Tables 2A and 2B. This indicates that fusion also enhances the robustness of the model. 
Table 1A.
 
Characteristics of Patients at the Two Centers
Table 1A.
 
Characteristics of Patients at the Two Centers
Table 1B.
 
Disease Characteristics for OCTA-500 and OCTA-6M (N = 300)
Table 1B.
 
Disease Characteristics for OCTA-500 and OCTA-6M (N = 300)
Table 2A.
 
Comparison of Performance Metrics for the External Test Set for the Different Models
Table 2A.
 
Comparison of Performance Metrics for the External Test Set for the Different Models
Table 2B.
 
Comparison of Performance Metrics for the OCTA-500 Test Set for the Different Models
Table 2B.
 
Comparison of Performance Metrics for the OCTA-500 Test Set for the Different Models
Table 3.
 
Influence on Performance of Using Different DL Backbone Models
Table 3.
 
Influence on Performance of Using Different DL Backbone Models
Table 4.
 
Influence on Performance of Using Different Machine Learning Methods
Table 4.
 
Influence on Performance of Using Different Machine Learning Methods
Model Interpretation
To aid in confirming the output of our model and interpreting the rationale behind its decision-making process, we utilized class activation mapping (CAM) for B-scan and en face images of typical diseases, shown in Figure 3. We found that, in the diagnosis of AMD, drusen or any retinal pigment epithelium abnormalities are usually highlighted in red, whereas, in the diagnosis of DR, areas with exudate or fluid are highlighted, consistent with known pathology of the disease. For other diseases, the model typically focuses on lesions corresponding to specific diseases, demonstrating that our proposed model can make correct decisions based on the appearance of lesions. 
Figure 3.
 
CAM activation maps of typical diseases. (Left) AMD, DR, RVO, and MH. (Right) CSC, EM, RS, and diabetes-related macular edema.
Figure 3.
 
CAM activation maps of typical diseases. (Left) AMD, DR, RVO, and MH. (Right) CSC, EM, RS, and diabetes-related macular edema.
Typical Cases Showing Advantages of Fusion Model
Figure 4 shows some typical cases that were misdiagnosed by B-scan or en face models but were correctly identified by our multiview fusion model. Figure 4A is a case of RVO. The B-scan model failed to identify the case as RVO, as it is not easily discernible from B-scan images, whereas it can be clearly identified from the en face image. Figure 4B is a case complicated by EM and macular edema. The B-scan model misclassified it as a macular hole, as in some parts the B-scan images look similar to a MH, but the en face image clearly shows signs of an EM. Figure 4C is a case of early dry AMD with tiny drusen. The en face model failed to identify it as AMD, possibly because the tiny drusen were hardly visible in the en face image but could be clearly observed in the B-scan image. Our fusion model correctly identified these cases, as it considers both en face images and B-scan images, leveraging the complementary information they provide. 
Figure 4.
 
Typical cases misdiagnosed by B-scan or en face models but correctly identified by our fusion model. (Top) RVO. (Middle) EM with macular edema. (Bottom) Early AMD with drusen. Red arrows indicate the location of the B-scan images.
Figure 4.
 
Typical cases misdiagnosed by B-scan or en face models but correctly identified by our fusion model. (Top) RVO. (Middle) EM with macular edema. (Bottom) Early AMD with drusen. Red arrows indicate the location of the B-scan images.
Discussion
In this study, we proposed an automated diagnostic framework based on multiview fusion of OCT B-scans and en face images to diagnose DR, AMD, MH, CSC, RS, EM, and RVO. We demonstrated that our framework significantly outperformed other state-of-the-art models, achieving an averaged AUC of 0.994 ± 0.001 and 95% confidence interval (CI) of 0.992 to 0.996. These results indicate that our proposed framework achieved highly reliable and accurate diagnostic performance for common retinal diseases. 
Simultaneous classification of multiple diseases using DL models is more desirable than single-disease models for disease screening in real-world scenarios, as patients may present with various types of diseases. Our framework can handle classification of up to seven diseases, which is an improvement compared to previous work focusing on the diagnosis of two to four diseases.9,12,26 Compared to methods that are based on the input of B-scan or en face images, our method can exploit the complementary information hidden in three different views. The underlying mechanism in our framework is similar to how retinal specialists diagnose diseases by examining both B-scan cross-sectional images and en face images before reaching a final conclusion. 
Previous work has combined color fundus photographs and OCT B-scan images to develop multimodal models.27 This is analogous to our inclusion of OCT en face images, as color fundus photographs also reflect en face information. Compared to such earlier work, our framework can achieve significant performance improvement without adding extra hardware modalities, which can facilitate its use in clinical settings where smart diagnostics can be performed on a single device. 
There are a few areas where the diagnostic performance of our framework could be improved by future studies. First, out dataset included only normal retinas and seven types of diseases. Our framework could be extended to handle more disease types if a larger dataset with more disease types is available. Second, the generation of en face images can be distorted by incorrect segmentation of retinal layer structures. Moreover, eye motion during scanning could still introduce motion artifacts even with an eye-tracking system. In the current study, we excluded cases where severe artifacts were present in the en face images; therefore, it is not clear how these artifacts may affect the performance of the model, which should be studied in the future. Third, the current study is a retrospective study. Although we validated our method on a test set from an external center, a prospective study involving more centers and more types of vendor machines should be conducted to further assess the generalizability of the proposed framework. 
Conclusions
We have proposed a multiview fusion network that effectively integrates information from OCT B-scan and en face images for accurate diagnosis of multiple retinal diseases with an averaged AUC of 0.994, outperforming five other state-of-the-art methods based on B-scans, en face images, or 3D volume input. Our results indicate that B-scans and en face images provide complementary information and should be integrated for better accuracy. Our proposed framework can facilitate the smart diagnosis of vision-threatening disease using OCT. 
Acknowledgments
Supported by grants from the National Natural Science Foundation of China (82302300) and the Guangdong Provincial Science and Technology Project (2023A0505030004). 
Disclosure: C. Ou, None; X. Wei, None; L. An, None; J. Qin, None; M. Zhu, None; M. Jin, None; X. Kong, None 
References
Wanek J, Zelkha R, Lim JI, Shahidi M. Feasibility of a method for en face imaging of photoreceptor cell integrity. Am J Ophthalmol. 2011; 152: 807–814. [CrossRef] [PubMed]
Nunes RP, Gregori G, Yehoshua Z, et al. Predicting the progression of geographic atrophy in age-related macular degeneration with SD-OCT en face imaging of the outer retina. Ophthalmic Surg Lasers Imaging Retina. 2013; 44: 344–359. [CrossRef] [PubMed]
Sayanagi K, Gomi F, Akiba M, et al. En-face high-penetration optical coherence tomography imaging in polypoidal choroidal vasculopathy. Br J Ophthalmol. 2015; 99: 29–35. [CrossRef] [PubMed]
De Bats F, Wolff B, Mauget-Faÿsse M, Scemama C, Kodjikian L. B-scan and “en-face” spectral-domain optical coherence tomography imaging for the diagnosis and followup of acute retinal pigment epitheliitis. Case Rep Med. 2013; 2013: 260237. [CrossRef] [PubMed]
Tsuboi K, Mazloumi M, Guo Y, et al. Utility of en face OCT for the detection of clinically unsuspected retinal neovascularization in patients with diabetic retinopathy. Ophthalmol Retina. 2023; 7(8): 683–691. [CrossRef] [PubMed]
Wolff B, Matet A, Vasseur V, Sahel J-A, Mauget-Faÿsse M. En face OCT imaging for the diagnosis of outer retinal tubulations in age-related macular degeneration. J Ophthalmol. 2012; 2012: 542417. [CrossRef] [PubMed]
Lee CS, Baughman DM, Lee AY. Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol Retina. 2017; 1(4): 322–327. [CrossRef] [PubMed]
Motozawa N, An GZ, Takagi S, et al. Optical coherence tomography-based deep-learning models for classifying Normal and age-related macular degeneration and exudative and non-exudative age-related macular degeneration changes. Ophthalmol Ther. 2019; 8(4): 527–539. [CrossRef] [PubMed]
Li X, Shen L, Shen M, Tan F, Qiu CS. Deep learning based early stage diabetic retinopathy detection using optical coherence tomography. Neurocomputing. 2019; 369: 134–144. [CrossRef]
Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018; 172(5): 1122–1131.e9. [CrossRef] [PubMed]
Ran AR, Wang X, Chan PP, et al. Three-dimensional multi-task deep learning model to detect glaucomatous optic neuropathy and myopic features from optical coherence tomography scans: a retrospective multi-centre study. Front Med. 2022; 9: 860574. [CrossRef]
Pang S, Zou B, Xiao X, et al. A novel approach for automatic classification of macular degeneration OCT images. Sci Rep. 2024; 14(1): 19285. [CrossRef] [PubMed]
Tang FY, Wang X, Ran AR, et al. A multitask deep-learning system to classify diabetic macular edema for different optical coherence tomography devices: a multicenter analysis. Diabetes Care. 2021; 44(9): 2078–2088. [CrossRef] [PubMed]
De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018; 24(9): 1342–1350. [CrossRef] [PubMed]
Zang P, Hormel TT, Hwang TS, Bailey ST, Huang D, Jia Y. Deep-learning–aided diagnosis of diabetic retinopathy, age-related macular degeneration, and glaucoma based on structural and angiographic OCT. Ophthalmol Sci. 2023; 3(1): 100245. [CrossRef] [PubMed]
Gao M, Guo Y, Hormel TT, et al. Retinal nonperfused capillaries identified and characterized by OCT and OCTA. Invest Ophthalmol Vis Sci. 2024; 65(7): 4332.
Pramil V, de Sisternes L, Omlor L, et al. A deep learning model for automated segmentation of geographic atrophy imaged using swept-source OCT. Ophthalmol Retina. 2023; 7(2): 127–141. [CrossRef] [PubMed]
Aoyama Y, Maruko I, Kawano T, et al. Diagnosis of central serous chorioretinopathy by deep learning analysis of en face images of choroidal vasculature: a pilot study. PLoS One. 2021; 16(6): e0244469. [CrossRef] [PubMed]
Akinniyi O, Rahman MM, Sandhu HS, El-Baz A, Khalifa F. Multi-stage classification of retinal OCT using multi-scale ensemble deep architecture. Bioengineering. 2023; 10(7): 823. [CrossRef] [PubMed]
Niu Z, Deng Z, Gao W, et al. FNeXter: a multi-scale feature fusion network based on ConvNeXt and Transformer for retinal OCT fluid segmentation. Sensors. 2024; 24(8): 2425. [CrossRef] [PubMed]
Sun Y, Zhang H, Yao X. Automatic diagnosis of macular diseases from oct volume based on its two-dimensional feature map and convolutional neural network with attention mechanism, J Biomed Opt. 2020; 25(9): 096004. [CrossRef] [PubMed]
de Vente C, González-Gonzalo C, Thee EF, van Grinsven M, Klaver CC, Sánchez CI. Making AI transferable across OCT scanners from different vendors. Invest Ophthalmol Vis Sci. 2021; 62(8): 2118.
Li M, Huang K, Xu Q, et al. OCTA-500: a retinal dataset for optical coherence tomography angiography study. Med Image Anal. 2024; 93: 103092. [CrossRef] [PubMed]
Heinke A, Zhang H, Deussen D, et al. Artificial intelligence for optical coherence tomography angiography-based disease activity prediction in age-related macular degeneration. Retina. 2022; 44(3): 465–474.
de Vente C, van Ginneken B, Hoyng CB, Klaver CC, Sánchez CI. Uncertainty-aware multiple-instance learning for reliable classification: application to optical coherence tomography. Med Image Anal. 2024; 97: 103259. [CrossRef] [PubMed]
Zhou Y, Chia MA, Wagner SK, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023; 622(7981): 156–163. [CrossRef] [PubMed]
Yoo TK, Choi JY, Seo JG, et al. The possibility of the combination of OCT and fundus images for improving the diagnostic accuracy of deep learning for age-related macular degeneration: a preliminary experiment. Med Biol Eng Comput. 2019; 57(3): 677–687. [CrossRef] [PubMed]
Figure 1.
 
Schematics of the proposed multiview fusion network (MVFN).
Figure 1.
 
Schematics of the proposed multiview fusion network (MVFN).
Figure 2.
 
ROC curves of the six models for different disease classes.
Figure 2.
 
ROC curves of the six models for different disease classes.
Figure 3.
 
CAM activation maps of typical diseases. (Left) AMD, DR, RVO, and MH. (Right) CSC, EM, RS, and diabetes-related macular edema.
Figure 3.
 
CAM activation maps of typical diseases. (Left) AMD, DR, RVO, and MH. (Right) CSC, EM, RS, and diabetes-related macular edema.
Figure 4.
 
Typical cases misdiagnosed by B-scan or en face models but correctly identified by our fusion model. (Top) RVO. (Middle) EM with macular edema. (Bottom) Early AMD with drusen. Red arrows indicate the location of the B-scan images.
Figure 4.
 
Typical cases misdiagnosed by B-scan or en face models but correctly identified by our fusion model. (Top) RVO. (Middle) EM with macular edema. (Bottom) Early AMD with drusen. Red arrows indicate the location of the B-scan images.
Table 1A.
 
Characteristics of Patients at the Two Centers
Table 1A.
 
Characteristics of Patients at the Two Centers
Table 1B.
 
Disease Characteristics for OCTA-500 and OCTA-6M (N = 300)
Table 1B.
 
Disease Characteristics for OCTA-500 and OCTA-6M (N = 300)
Table 2A.
 
Comparison of Performance Metrics for the External Test Set for the Different Models
Table 2A.
 
Comparison of Performance Metrics for the External Test Set for the Different Models
Table 2B.
 
Comparison of Performance Metrics for the OCTA-500 Test Set for the Different Models
Table 2B.
 
Comparison of Performance Metrics for the OCTA-500 Test Set for the Different Models
Table 3.
 
Influence on Performance of Using Different DL Backbone Models
Table 3.
 
Influence on Performance of Using Different DL Backbone Models
Table 4.
 
Influence on Performance of Using Different Machine Learning Methods
Table 4.
 
Influence on Performance of Using Different Machine Learning Methods
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×