Abstract
Purpose:
To develop an automated method based on deep learning (DL) to classify macular edema (ME) from the evaluation of optical coherence tomography (OCT) scans.
Methods:
A total of 4230 images were obtained from data repositories of patients attended in an ophthalmology clinic in Colombia and two free open-access databases. They were annotated with four biomarkers (BMs) as intraretinal fluid, subretinal fluid, hyperreflective foci/tissue, and drusen. Then the scans were labeled as control or ocular disease among diabetic macular edema (DME), neovascular age-related macular degeneration (nAMD), and retinal vein occlusion (RVO) by two expert ophthalmologists. Our method was developed by following four consecutive phases: segmentation of BMs, the combination of BMs, feature extraction with convolutional neural networks to achieve binary classification for each disease, and, finally, multiclass classification of diseases and control images.
Results:
The accuracy of our model for nAMD was 97%, and for DME, RVO, and control were 94%, 93%, and 93%, respectively. Area under curve values were 0.99, 0.98, 0.96, and 0.97, respectively. The mean Cohen's kappa coefficient for the multiclass classification task was 0.84.
Conclusions:
The proposed DL model may identify OCT scans as normal and ME. In addition, it may classify its cause among three major exudative retinal diseases with high accuracy and reliability.
Translational Relevance:
Our DL approach can optimize the efficiency and timeliness of appropriate etiological diagnosis of ME, thus improving patient access and clinical decision making. It could be useful in places with a shortage of specialists and for readers that evaluate OCT scans remotely.
The OCT scans of this study were collected from two free open-access databases and one private dataset. The free open-access databases were the ZhangLab dataset,
24 which contains 207,130 OCT scans taken from patients with choroidal neovascularization, DME, drusen, and control, and the DUKE dataset,
25 which contains 269 SD-OCT volumes with 269,000 scans from people between 50 and 85 years of age with large drusen (>125 µm) and AMD without any vitreoretinal surgery. The OCT volumes extracted from the ZhangLab and Duke datasets were acquired with 49 lines in a 6 × 6 mm cube. The complete set was exported as an E2E file into the free-open Labelbox digital platform, where annotations were performed. From these two open-access databases, 1343 images with the presence of ME and BMs (disease) and 1343 images with the absence of ME (control) were selected.
The private dataset was provided from the repository of patients who attended an ophthalmology clinic in Colombia between 2015 and 2020. A total of 772 images with the presence of ME and BMs (disease) and 772 images with the absence of ME (control) were obtained. All images of all patients were supported by clinical records that included a full ophthalmologic examination, OCT, and fluorescein angiography (FA) assessment, and the respective confirmation of the proper diagnosis, performed by an experienced retinal specialist. These OCT scans were acquired using a Zeiss Cirrus HD-OCT 5000 device (Zeiss, Oberkochen, Germany) capturing the area of 6 × 6 × 2 mm3 centered on the fovea and were also exported as an E2E file into the Labelbox platform. The two expert readers assessed the scan quality, ensuring it was suitable for determining the presence of pathological patterns. The poor-quality images were excluded. All scans were de-identified before being analyzed by expert readers to protect the safety and privacy of patients. The ethics approval for the research followed the Ethics Committee of the Faculty of Medicine of Universidad Nacional de Colombia (Ref. 018-182; November 12, 2020). Moreover, the study was conducted according to the tenets of Helsinki.
The ZhangLab dataset
24 originally included a representative cohort of patients with a distinct ethnicity, including Caucasian, Asian, Hispanic, African American, and mixed population. The DUKE dataset
25 included patients from the Age-Related Eye Disease Study 2 Ancillary SD-OCT Study, which was originally enrolled at clinical centers in the United States. Our patient characteristics for each diagnosis class are shown as supplemental information (
Supplementary Table S1).
Finally, a total of 4230 images (half ME images and half control images) were collected from the three databases by two expert ophthalmologists according to the presence of ME with 2115 images and control (absence of ME) with 2115 images. The images were manually annotated with four key BMs and labeled with disease or control by two expert ophthalmologists. The segmented BMs included IRF, SRF, HRF/T, and drusen, taking into account retinal layers’ delineation and the mutual agreement about the manual segmentations. IRF was considered as hyporeflective cystoid spaces within the surrounding retinal neuroepithelium, with a minimum size of 25 µm. SRF was taken as the hyporeflective space that separates the retina pigment epithelium from the photoreceptor layer. HRF were small reflective dots (<25 µm), and HRT was defined as larger areas than 25 µm of reflective material. Drusen were identified as small elevations between the pigment epithelium and Bruch membrane. Those elevations larger than 25 µm were considered as pigment epithelium detachment. In those cases of initial dissension or uncertainty, the most experienced retinal specialist, with more than 30 years of experience, decided on the proper segmentation and the most relevant findings.
ME recognition was based on the presence of fluid (IRF or SRF), and further classification of its causative disease was made between DME, nAMD, and RVO. The association of distinctive BMs for each disease was initially helpful to classify remotely the underlying pathology, according to the literature: the combination of drusen, HRF/T, PED, and SRF for nAMD
10; IRF (cystoid spaces), diffuse HRF (more than 30 in number), DRIL (with loss of parallelism of retinal layers without the ability to distinguish them), in the case of DME
11; macrocystoid spaces, SRF and perilesional HRF for RVO
12; absence of fluid (SRF or IRF) and BMs in control images. Then, the initial classification was cross-checked and verified, by considering the true labels specified in the two open-access databases
15,24 and the appropriate diagnosis performed by experienced retinal specialists in the case of images acquired from patients attended in the ophthalmology clinic (substantiated with evidence of clinical records, OCT, and FA evaluation). These last values were considered the gold standard or ground truth.
The 4230 images were randomly split into eleven different and independent subsets. Four subsets with BMs were generated for the segmentation task, and the remaining subsets were used for the classification tasks: three subsets for the binary classification task of each disease, and four subsets for multiclass classification between control images and the three different diseases. Finally, the balanced datasets were split into training, test, and validation sets containing 70%, 20%, and 10% of images, respectively. Moreover, the OCT scans from a single volume should belong to a single dataset for classification tasks to ensure heterogeneous data from the subsets. The distribution of images per set is presented in
Table 1.
Table 1. Images Per Dataset for Segmentation and Classification Of OCT Scans
Table 1. Images Per Dataset for Segmentation and Classification Of OCT Scans
ME is a leading cause of decreased visual acuity and potential irreversible visual loss, and the main causes correspond to three particularly frequent diseases in older patients: AMD, DR, and RVO.
2,3 They may even coincide in the same patient and usually exhibit similar characteristics that challenge the proper diagnosis of the underlying disease, even by experienced retinal specialists.
3 The prognostic value and therapeutic orientation depend on their adequate recognition, including the selection of the most suitable molecule in cases of intravitreal management with antiangiogenic and steroid agents.
3,4 The clinical evaluation is supported with diagnostic tools such as FA and OCT, which is considered the gold standard for the diagnosis and monitoring of ME.
5 No study has designed an automated algorithm for the recognition of the cause of ME among the three main retinal exudative diseases from the exclusive evaluation of OCT images to the best of our knowledge. Previous works are limited to the recognition of retinal fluid and its location, as well as the isolated identification of certain BMs,
16–23 which are not grouped for the classification of the causative disease of ME. Additionally, the specific case of RVO associated with ME has not been explored with an automated approach.
This study proposes a DL method applied to OCT scans for the automatic segmentation of BMs and the classification of macular diseases. Our proposed approach method achieved a state-of-the-art performance, showing an improvement over the original architecture (DRIU) for OCT scans segmentation. Res-UNet++ and SE-DRIU CNNs presented better results for fluid segmentation compared to DRIU and UNet models. However, our proposed SE-DRIU method got better performance because it reduced the number of trainable parameters, which means less time for training and predicting.
27
Our method is based on the recognition of key BMs and their appropriate combination for the diagnostic approximation of the causal disease of ME, according to the literature,
10–12 which is in specific the analytical exercise that an expert reader should normally do, with the challenge of differentiating similar findings between the main underlying conditions and sometimes without the support of clinical information or additional diagnostic tools.
The segmentation of BMs and the performance of the model for their detection revealed interesting findings that are evident in clinical practice when recognizing and interpreting OCT images. Thus, for example, IRF exhibited the best recognition performance, while HRF had lower values of DC. This is explained by the greater facility for the right identification and demarcation of the retinal cystoid spaces, because of their size and the convenient contrast with the surrounding neuroepithelium, both at the time of reading by an expert and when determining their presence by the automated model. On the other hand, smaller and often multiple findings with difficult differentiation of contrast from the surrounding retinal layers, such as drusen and specifically in the case of HRF, make manual segmentation difficult for precise demarcation and it becomes a great challenge to guarantee coverage of the entire of these findings, which can also be confused if they are located close to the retinal hyperreflective layers. These same difficulties that the clinicians undergo, even the most experienced, are also present at the time of being evaluated and performed by an automated method.
The accuracy, sensitivity, and specificity achieved by our model were comparable to the performance of an expert specialist for the classification of the three diseases, as verified by the good results obtained with the κ, which also confirms a remarkable interobserver concordance. The best results were obtained for the DME and nAMD classification. Although the specificity for RVO was lower compared with the other diseases, the manual detection and segmentation is not an easy task due to the great similarity in ME patterns between DME and associated with RVO, which can be confused even by highly experienced readers.
Li et al.
33 developed a classification algorithm for the automatic detection of choroidal neovascularization, DME, drusen, and normal images on OCT scans using the ResNet50 neural network. They achieved an outstanding classification performance with an accuracy of 0.973, a sensitivity of 0.963, and a specificity of 0.985.33. Tsuji et al.
22 proposed a method to improve classification accuracy by replacing convolution neural networks with a capsule network and achieved an accuracy of 0.996. Taking these models as a baseline to compare our results, a method was proposed with excellent accuracy, sensitivity, and specificity for the classification of ME caused by the three major macular exudative diseases and normal images, in a way comparable to the reading of an expert specialist. The application of our model could be especially useful in the support of the diagnostic process at different moments of the overall process of patient care. Thus, for example, it can be supportive at the primary care level for optometrists and general practitioners as a screening tool. It can also be very useful for general ophthalmologists in the diagnostic and referral process, as well as for retina specialists in making clinical decisions, collecting information for the evaluation of local epidemiology, and the predictive study of these conditions.
34 It can also be incorporated into digital health strategies such as telemedicine, in light of the additional challenges posed by public health contingencies, such as the recent SARS-CoV-2 pandemic, to try to overcome these added barriers to prompt care.
35
Moreover, there was an interobserver comparison. The scans were properly labeled and classified by two experienced ophthalmologists, who also had access to medical records and other diagnosis tools such as FA, to be consistent with the right classification in the case of the private dataset, and to the true labels specified in the two open-access databases (ZhangLab
24 and Duke
25 datasets). It is well known that in some locations, very few ophthalmologists read the scans remotely, and the demand outstrips supply. Then, our method could be especially useful to optimize the efficiency and timeliness of appropriate diagnosis, as well as clinical decision making, thus improving patient access and care, particularly in places with few readers who must issue their medical opinions without other supportive tools.
The limitations of this study include the limited number of OCT scans, expert readers, and the restraint to one ophthalmology center and two free open-access databases. Moreover, the use of retrospective data restricts the opportunity to include clinical information and imaging follow-up, which may enhance the performance of the model. Although the focus of our method was on the three major exudative macular diseases, other retinal conditions could be associated with the presence of retinal fluid, as is already the case of vitreomacular traction syndrome (VMT). The VMT disorders are often clearer and more consistent for their remote identification with the exclusive evaluation of OCT images, considering the evident finding of epiretinal membranes and the tractional pull on the macula, with the consequent alteration of retinal architecture. Because of the relative ease in recognizing cases of VMT without the need for supportive automated tools, the classification of these conditions was not included in this study. However, it is recognized that it is an important differential diagnosis that should be explored together with the three main exudative macular diseases in future studies.
In an attempt to include ME images from patients of different ethnicity, our study included: OCT scans from the Latin American population attended at the ophthalmology clinic, random images acquired from the ZhangLab dataset (which originally included a representative cohort of Caucasian, Asian, Hispanic, African American, and mixed population)
24 and random images obtained from the DUKE dataset that included the United States Population.
25 However, the inclusion of multiple ophthalmological centers and the proper demographic characterization of their patients in different locations will allow assessing the generalizability in future studies. Then this study offers a basic architecture that can be enriched by multiple ophthalmological centers, feeding the system with a greater number of images and expert readers, to improve the diagnostic behavior.
The Grad-CAM exploration opened the black box of the model by conferring interpretability and a visual explanation of the performance of the CNNs that highlights key BMs and their combination. This issue should be validated in future studies by performing proper training of CNNs on raw OCT scans. In this regard, the model provides qualitative information that is of great importance for clinical practice. Because of that, it could be a teaching tool for the reading and interpretation of OCT scans by comparing the performance of students with different levels of training and the model lecture, which would be theoretically comparable to the diagnosis made by a retina professor. In future studies, the integration of complementary models for the diagnosis, treatment, and prognosis of diseases could provide a valuable strategy for the comprehensive clinical analysis of patients, promoting the opportunity for timely attention and clinical decision making in a fast, efficient, and reliable way.
Our method not only recognizes macular edema, as has been explored in previous studies through fluid identification, but it also may classify its cause among three different maculopathies, which are the major exudative retinal diseases. The proper recognition of key BMs and their specific combination, including location and quantity of the findings, allowed pattern determination of ME that in turn achieved the right identification of the underlying disease. The Grad-CAM exploration opened the black box of the developed mathematical model, conferring interpretability to our automated method and a corresponding visual explanation of the performance of the CNNs, revealing the importance of the automated identification of BMs and their association. Furthermore, this model may help to deal with the high demand for several tests, lifting the burden for ophthalmologists, particularly in places where the availability of experts is scarce. Likewise, this approach may become especially useful considering that it is not uncommon that many readers must issue their medical opinions exclusively from the evaluation of OCT scans, with no access to medical records and other diagnostics and guidance tools, for which an automated etiological approach could provide greater accuracy in diagnosis, allowing them to make the timeliest and appropriate medical decisions.
The authors thank the management staff of the Ophthalmology Clinic “Oftalmocenter Ltda.” for providing the private data used in this study.
Disclosure: F.D. Padilla-Pantoja, None; Y.D. Sanchez, None; B.A. Quijano-Nieto, None; O.J. Perdomo, None; F.A. Gonzalez, None