September 2024
Volume 13, Issue 9
Open Access
Artificial Intelligence  |   September 2024
Generalizable Deep Learning for the Detection of Incomplete and Complete Retinal Pigment Epithelium and Outer Retinal Atrophy: A MACUSTAR Report
Author Affiliations & Notes
  • Coen de Vente
    Quantitative Healthcare Analysis (qurAI) Group, Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
    Amsterdam UMC location University of Amsterdam, Biomedical Engineering and Physics, Amsterdam, The Netherlands
    Diagnostic Image Analysis Group (DIAG), Department of Radiology and Nuclear Medicine, Radboud UMC, Nijmegen, The Netherlands
  • Philippe Valmaggia
    Department of Biomedical Engineering, Universität Basel, Basel, Basel-Stadt, Switzerland
    Institute of Molecular and Clinical Ophthalmology Basel, Basel, Basel-Stadt, Switzerland
  • Carel B. Hoyng
    Department of Ophthalmology, Radboudumc, Nijmegen, The Netherlands
  • Frank G. Holz
    Department of Ophthalmology and GRADE Reading Center, University Hospital Bonn, Germany
  • Mohammad M. Islam
    Quantitative Healthcare Analysis (qurAI) Group, Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
    Amsterdam UMC location University of Amsterdam, Biomedical Engineering and Physics, Amsterdam, The Netherlands
  • Caroline C. W. Klaver
    Department of Ophthalmology, Radboudumc, Nijmegen, The Netherlands
    Ophthalmology and Epidemiology, Erasmus MC, Rotterdam, The Netherlands
  • Camiel J. F. Boon
    Department of Ophthalmology, Leiden University Medical Center, Leiden, The Netherlands
    Department of Ophthalmology, Amsterdam University Medical Centers, Amsterdam, The Netherlands
  • Steffen Schmitz-Valckenberg
    Department of Ophthalmology and GRADE Reading Center, University Hospital Bonn, Germany
    John A. Moran Eye Center, University of Utah, Salt Lake City, UT, USA
  • Adnan Tufail
    Moorfields Eye Hospital NHS Foundation Trust, London, UK
  • Marlene Saßmannshausen
    Department of Ophthalmology and GRADE Reading Center, University Hospital Bonn, Germany
  • Clara I. Sánchez
    Quantitative Healthcare Analysis (qurAI) Group, Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
    Amsterdam UMC location University of Amsterdam, Biomedical Engineering and Physics, Amsterdam, The Netherlands
  • Correspondence: Clara I. Sánchez, Science Park 904, 1098 XH Amsterdam, The Netherlands. e-mail: c.i.sanchezgutierrez@uva.nl 
  • Footnotes
     MS and CIS contributed equally as co-last authors.
Translational Vision Science & Technology September 2024, Vol.13, 11. doi:https://doi.org/10.1167/tvst.13.9.11
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Coen de Vente, Philippe Valmaggia, Carel B. Hoyng, Frank G. Holz, Mohammad M. Islam, Caroline C. W. Klaver, Camiel J. F. Boon, Steffen Schmitz-Valckenberg, Adnan Tufail, Marlene Saßmannshausen, Clara I. Sánchez, on behalf of the MACUSTAR Consortium; Generalizable Deep Learning for the Detection of Incomplete and Complete Retinal Pigment Epithelium and Outer Retinal Atrophy: A MACUSTAR Report. Trans. Vis. Sci. Tech. 2024;13(9):11. https://doi.org/10.1167/tvst.13.9.11.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The purpose of this study was to develop a deep learning algorithm for detecting and quantifying incomplete retinal pigment epithelium and outer retinal atrophy (iRORA) and complete retinal pigment epithelium and outer retinal atrophy (cRORA) in optical coherence tomography (OCT) that generalizes well to data from different devices and to validate in an intermediate age-related macular degeneration (iAMD) cohort.

Methods: The algorithm comprised a domain adaptation (DA) model, promoting generalization across devices, and a segmentation model for detecting granular biomarkers defining iRORA/cRORA, which are combined into iRORA/cRORA segmentations. Manual annotations of iRORA/cRORA in OCTs from different devices in the MACUSTAR study (168 patients with iAMD) were compared to the algorithm's output. Eye level classification metrics included sensitivity, specificity, and quadratic weighted Cohen's κ score (κw). Segmentation performance was assessed quantitatively using Bland-Altman plots and qualitatively.

Results: For ZEISS OCTs, sensitivity and specificity for iRORA/cRORA classification were 38.5% and 93.1%, respectively, and 60.0% and 96.4% for cRORA. For Spectralis OCTs, these were 84.0% and 93.7% for iRORA/cRORA, and 62.5% and 97.4% for cRORA. The κw scores for 3-way classification (none, iRORA, and cRORA) were 0.37 and 0.73 for ZEISS and Spectralis, respectively. Removing DA reduced κw from 0.73 to 0.63 for Spectralis.

Conclusions: The DA-enabled iRORA/cRORA segmentation algorithm showed superior consistency compared to human annotations, and good generalization across OCT devices.

Translational Relevance: The application of this algorithm may help toward precise and automated tracking of iAMD-related lesion changes, which is crucial in clinical settings and multicenter longitudinal studies on iAMD.

Introduction
The robust and reliable detection of structural biomarkers for early disease progression in age-related macular degeneration (AMD) is crucial for validating potential structural endpoints in future interventional trials.13 Alongside the recent emergence of fast, highly reproducible, and high-resolution multimodal retinal imaging, various intermediate AMD (iAMD) biomarkers for early atrophy manifestation have been identified based on optical coherence tomography (OCT).49 This advancement has enabled more accurate disease progression predictions in individual patients.1013 The Classification of Atrophy Meetings (CAM) group, comprised of international AMD and retinal imaging specialists, has defined OCT imaging-based lesions of atrophy development as incomplete retinal pigment epithelium and outer retinal atrophy (iRORA) and complete retinal pigment epithelium and outer retinal atrophy (cRORA).14,15 Ongoing and potentially upcoming observational iAMD studies focus on utilizing these lesions of atrophy development.1,16,17 
Despite these advancements, the improvement of multimodal retinal imaging data present practical challenges, as human assessment alone cannot keep up with the large multicenter datasets that are becoming available. Therefore, we need automated approaches to support image grading and data analysis. Various studies have presented deep learning models for detecting iRORA and cRORA in OCT scans.1821 These algorithms were all evaluated using OCT data from a single device (Heidelberg Spectralis, Heidelberg, Germany), which was also used to acquire their training data. 
However, many studies investigating AMD progression and real-world clinical settings use a wide variety of OCT devices.22 At the same time, deep learning models trained on data from one OCT device have been shown to generalize poorly to other devices without any further training,2325 illustrating the wider challenge of compromised deep learning reliability under data shift in computer vision26 and medical imaging.2730 Automated approaches to localize iRORA, cRORA, and their underlying granular features that generalize well to data from different device manufacturers are still warranted. A traditional fully supervised learning approach would require extensive manual annotation for the data of each OCT device, a process that is both time-intensive and resource-demanding. 
This study aims to develop and evaluate a machine learning pipeline for detecting the underlying structural characteristics defining iRORA and cRORA lesions in OCT that is robust to data from two different devices, without requiring manually annotated training data from those devices. We use the MACUSTAR study, a prospective, multicenter, and low-interventional clinical study in subjects with AMD, for evaluation.1 It contains standardized OCT data acquired with two common devices from different manufacturers, enabling the rigorous assessment of model generalizability across these devices. 
Methods
Dataset and Study Cohort
Evaluation Data for iRORA and cRORA Detection Algorithm
The MACUSTAR Study Cohort
MACUSTAR is a prospective, multicenter, and low-interventional clinical study (ClinicalTrials.gov Identifier: NCT03349801) in subjects with AMD that is conducted at 20 sites in 7 European countries and consists of a cross-sectional and longitudinal study part. Detailed information on the design and clinical study protocol of the study have been reported previously.1 Human research ethics committee approval was obtained at all participating clinical sites before enrollment, complying with all applicable legal regulations as previously described. Participants provided informed consent before study recruitment and data collection, and this study has been conducted according to the provisions of the Declaration of Helsinki. 
The study's inclusion and exclusion criteria have been described previously in detail.31,32 In brief, based on Ferris et al.,33 iAMD was defined by the presence of large drusen (>125 µm) and/or presence of pigmentary abnormalities associated with medium drusen (>63 µm and ≤125 µm). Extrafoveal atrophy (<1.25 mm²) outside the central subfield of the Early Treatment Diabetic Retinopathy Study Grid was permitted in the fellow eye of the iAMD study eye. If there was any geographic atrophy (>0.1 mm² in fundus autofluorescence imaging) or macular neovascularization, the study patient was included in the late AMD group of MACUSTAR. The herein presented analysis is only based on iAMD study eyes of the cross-sectional study part of the baseline visit (V2) of MACUSTAR. 
Retinal Imaging Protocol of the MACUSTAR Study
Following pupil dilation (tropicamide 0.5% and phenylephrine 2.5%) MACUSTAR study patients underwent multimodal retinal imaging according to standard operational procedures by certified study site personnel. The retinal imaging protocol comprised spectral domain (SD) OCT imaging performed with the Cirrus (Macular Cube 200 × 200, centered on fovea, signal strength at least 6/10; Zeiss, Meditec, Oberkochen, Germany) and the Spectralis device (HRA + OCT device, digital imaging resolution 768 × 768 pixels, 30 degrees × 25 degrees, enhanced–depth-imaging mode, centered on the fovea, 241 B scans, distance between scans 30 µm, automatic real-time mode, 9 frames, Heidelberg Engineering, Heidelberg, Germany). For each patient, the Spectralis and ZEISS OCTs were acquired at the same study date and during the same visit. Further details on the standardized retinal imaging protocol with additionally acquired imaging modalities within the MACUSTAR study have been previously reported.31,32 All imaging data were transmitted to the GRADE Reading Center (University of Bonn, Bonn, Germany). 
Annotation of iRORA and cRORA Lesions
Manual annotation of iRORA and cRORA lesions was performed at the B-scan level in both the Zeiss and Spectralis SD-OCT imaging datasets by readers experienced in AMD studies at the GRADE Reading Centre, who had received detailed training in iRORA/cRORA detection. Therefore, each OCT B-scan of both imaging devices was reviewed and carefully screened for meeting predefined lesion criteria. According to the definition by the CAM group,15 a lesion was graded and annotated as iRORA if there was: (1) a region of choroidal hypertransmission <250 µm in diameter, and (2) a zone of attenuation or disruption of the retinal pigment epithelium (RPE) of <250 µm in diameter, (3) evidence of photoreceptor degeneration, and (4) absence of an RPE tear. A lesion was defined as cRORA, if the aforementioned size criteria related to choroidal hypertransmission and RPE attenuation or disruption exceeded 250 µm.14 This diameter was measured automatically by calculating the Feret diameter (in any direction in the en face plane, using scikit-learn).34 Manual annotation of iRORA/cRORA was performed in all B-scans meeting lesion criteria. The screening of OCT B-scans as well as the annotation of iRORA/cRORA lesions was first performed in the 200 × 200 Macular Cube SD-OCT scans of each study patient, followed by the 241 B-scans Spectralis SD-OCT imaging dataset. The grading and annotation of iRORA/cRORA lesions were performed on the Zeiss SD-OCT imaging data in the device's manufacturer platform (Zeiss Forum, Retina Workplace). For the manual annotation of iRORA/cRORA lesions in the Spectralis data set, the 241 B-scan SD-OCT imaging data were imported for each study patient in a raw file format to an at the reading center developed annotation platform based on the Jupyter Notebook platform (jupyter.org). Under detailed consideration of each OCT B-scan, OCT volumes of both manufacturers were graded for the presence of iRORA/cRORA by a medical reader who is highly experienced in AMD trials. If a lesion was detected as present, the maximum horizontal diameter of the lesion was manually annotated in each B-scan fulfilling lesion criteria using the highlighting tools implemented in the corresponding platforms. Annotated imaging data were subsequently exported for further detailed artificial intelligence (AI)-based data analysis. 
In a subanalysis, manually performed annotations were revised in 12 study eyes with SD-OCT imaging based on discrepancies between the AI model outputs and manual references. 
Development Data for Domain Adaptation
For the development of the domain adaptation (DA) algorithm, we collected macula-centered SD-OCTs that were independent of the MACUSTAR data for both the Spectralis and Topcon domains. An overview of these datasets is shown in Table 1
Table 1.
 
Overview of the Development Data of the Domain Adaptation Model
Table 1.
 
Overview of the Development Data of the Domain Adaptation Model
The data from the Spectralis domain originated from the RETOUCH dataset35 and a proprietary dataset from the Amsterdam University Medical Centers (AUMC). RETOUCH contains data from patients with AMD, retinal vein occlusion, and diabetic macular edema. From the RETOUCH dataset, 24 Spectralis OCT volumes were used for the training data and 14 for the validation data. The RETOUCH OCTs contained 49 B-scans per volume with a size of 512 × 496 or 1024 × 496 pixels, in a raster of 6 × 6 mm2. From the AUMC dataset, 10 patients with AMD and 10 patients with X-linked retinoschisis were included in the training set. The AUMC OCTs contained either 19 or 25 B-scans per volume with a size of either 512 × 496 or 1024 × 496 pixels, in a raster of approximately 6 × 6 mm2. For the Spectralis domain, this resulted in a total of 1622 B-scans from 44 patients for training and 686 B-scans from 14 patients for validation. 
The Topcon domain data originated from the RETOUCH35 dataset and the Rotterdam Study.36 The Rotterdam Study is a prospective cohort study established in 1990 in the city of Rotterdam, The Netherlands. From this study, 100 Topcon OCT volumes were randomly selected for the training set of the Topcon domain. This resulted in a set of 100 OCTs from 97 patients, as we included 2 OCTs from 3 patients and one OCT from all other patients. From the RETOUCH training set, 21 Topcon OCT volumes were used for the training set of the Topcon domain and 13 for the validation data. The Topcon domain OCTs were acquired using T-1000 or T-2000 devices, each containing 128 B-scans of 512 × 885 or 512 × 650 pixels in a raster of 6 × 6 mm2. For the Topcon domain, this resulted in a total of 15,488 B-scans from 118 patients for training and 1664 B-scans from 13 patients for validation. 
iRORA and cRORA Detection Model
For the detection of iRORA and cRORA in OCT scans, we processed each OCT volume with our AMD biomarker segmentation model,21 obtaining the volumetric segmentation and quantification of 13 different structural biomarkers. The biomarker segmentation model was a deep learning model based on U-Net.37 This model took nine consecutive B-scans as input, providing segmentations for the middle B-scan. It was a convolutional neural network with 3 × 3 × 1 convolutions in U-Net layers and 1 × 1 × 3 convolutions in the deeper layers of the encoder. Following the iRORA/cRORA definitions defined by the CAM group,14,15 we combined three of these structural features, namely hypertransmission, RPE loss or attenuation, and ellipsoid zone loss (as a criterion for photoreceptor degeneration), as shown in Figure 1. If the three features detected by the biomarker segmentation model co-located in the A-scan (vertical) direction, the A-scans of that lesion were classified as either iRORA or cRORA. If these three features did not co-locate, the A-scan was classified as background. Subsequently, if the hypertransmission and RPE loss of this potential iRORA/cRORA lesion had a Feret diameter (in the en face plane, calculated using scikit-learn)34 of at least 250 µm, the lesion was considered a cRORA lesion. If the diameter was less than 250 µm, it was considered a lesion of iRORA. 
Figure 1.
 
Overview of the methods used for the automated iRORA/cRORA detection pipeline. Spectralis data were fed into a domain adaptation model. The transformed OCTs were subsequently used as input for the Liefers et al.21 segmentation model. ZEISS OCTs were directly fed into the segmentation model. The OCT volumes were fully segmented for all segmented features, including iRORA/cRORA. In the CycleGAN training diagram, GS→T is the generator for transforming the appearance of Spectralis images to the appearance of Topcon OCTs. GT→S is the generator for transforming the appearance of Topcon images to the appearance of Spectralis OCTs. DT is the discriminator for the Topcon image domain. att., attenuation; HTR, hypertransmission; PD, photoreceptor degeneration.
Figure 1.
 
Overview of the methods used for the automated iRORA/cRORA detection pipeline. Spectralis data were fed into a domain adaptation model. The transformed OCTs were subsequently used as input for the Liefers et al.21 segmentation model. ZEISS OCTs were directly fed into the segmentation model. The OCT volumes were fully segmented for all segmented features, including iRORA/cRORA. In the CycleGAN training diagram, GS→T is the generator for transforming the appearance of Spectralis images to the appearance of Topcon OCTs. GT→S is the generator for transforming the appearance of Topcon images to the appearance of Spectralis OCTs. DT is the discriminator for the Topcon image domain. att., attenuation; HTR, hypertransmission; PD, photoreceptor degeneration.
Pre-processing of the Spectralis data was done using a DA approach, as described in the next section. For compatibility with the segmentation model, all 3D OCT volumes were cropped or padded to 6 × 2.3 × 6 mm3 around the volume center. Subsequently, we resized the volumes to 128 × 650 × 512 voxels, preparing them for input into the segmentation model. 
Domain Adaptation
The Spectralis OCT volumes were transformed using DA, based on Cycle-Consistent Generative Adversarial Networks (CycleGAN).38 We aimed to modify the appearance of the Spectralis scans to align with the appearance of Topcon OCT scans, intending to promote compatibility with the training data distribution of the biomarker segmentation model (trained on Topcon images). We did not use DA in the ZEISS OCT processing pipeline, as these OCTs already showed very similar image characteristics to the Topcon OCTs used to train the biomarker segmentation model. This is supported by the settings used in the ZEISS and Topcon devices, which do not use B-scan averaging, a scanning protocol that substantially impacts the signal-to-noise ratio. In contrast, this protocol is used in the Heidelberg Spectralis device.39 
CycleGANs are designed for image-to-image translation, allowing the translation of an image from a source domain to a target domain without paired data. It uses two generators, one for generating images from the Spectralis domain to the Topcon domain and one for generating images from the Topcon domain to the Spectralis domain. Moreover, one discriminator for each domain is trained to distinguish real images from fake images. The aim of the generators is to fool the discriminators into classifying real images as fake. Therefore, for validation and application of the CycleGAN in the iRORA/cRORA detection pipeline, we only used the generator for transforming Spectralis into Topcon images. As a pre-processing step, all Spectralis OCT volumes were resized to 512 × 650 × 128 pixels to ensure compatibility with our biomarker segmentation model. 
We largely used the same network and optimization settings as described in the original CycleGAN paper. We used the Adam optimizer40 with a learning rate of 2 × 10−4. The model was trained for 85 epochs (1,316,480 iterations). The adversarial loss was implemented using the mean squared error. The cycle consistency loss and the identity loss were implemented using the mean absolute error. The CycleGAN was trained on 2D patches of 256 × 256 pixels, randomly selected from the pre-processed OCTs. When generating images during test time, the whole B-scans of 512 × 650 pixels were used as model input. 
Statistical Analysis
We evaluated the agreement of iRORA/cRORA detection between the deep learning model and manual annotations on both eye level and in terms of area. To convert voxel-level annotations and outputs of iRORA/cRORA to eye level, we used the following workflow. If any cRORA was found, the eye level class became cRORA. If no cRORA was present, but iRORA was found, the eye level class became iRORA. Otherwise, the eye level label was “None.” 
For the eye level analysis, we evaluated the agreement using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and quadratic weighted Cohen's κ score (κw). The sensitivity and specificity were calculated both with an onset of iRORA (None versus iRORA/cRORA) and with an onset of cRORA (None/iRORA versus cRORA). The κw was calculated on the three-category classification problem of None versus iRORA versus cRORA. We reported the agreement between the human grader and the AI model using the aforementioned metrics. Furthermore, we assessed the intra-rater agreement for the human grader and the AI model separately using κw. When calculating standard deviations and P values for κw, we used nonparametric bootstrapping41 with 1000 iterations. We calculated the agreement in terms of area using Bland-Altman plots. 
The domain adaptation model was both qualitatively and quantitatively validated. For the quantitative analysis, we computed the Dice score of the fluid segmentation performance of a 2D nnU-Net42 model that was trained on the Topcon data from RETOUCH. The Dice score was calculated separately for each OCT volume. The mean and standard deviation of these Dice scores were subsequently calculated. The qualitative analysis was performed by visually inspecting artificially generated Topcon examples from real Spectralis images. 
Results
The cross-sectional part of the MACUSTAR study included a total of 168 study eyes of 168 patients with iAMD (mean age ± standard deviation = 71.2 ± 7.55 years). Due to the absence of 25 ZEISS 200 × 200 Macular Cube SD-OCT scans and insufficient image quality of one 241 B-scan Spectralis SD-OCT volume at baseline visit (V2), a total number of 143 patients with iAMD and ZEISS Macular Cube SD-OCT imaging, and 167 patients with iAMD and Spectralis SD-OCT imaging were included for further analysis. 
Segmentation and Eye Level Classification Performance
Table 2 presents the performance metrics of the iRORA/cRORA detection pipeline for eye-level classification, evaluated using the ZEISS and Spectralis datasets. The table includes comparisons both with and without the application of the DA approach to the Spectralis data. The specificities were high for both OCT devices and onsets, ranging from 93.1% to 97.4%. The sensitivities were lower, ranging from 38.5% to 84.0%. The κw between the manual annotations and the AI model was substantially higher for the Spectralis OCTs (0.73) than for the ZEISS OCTs (0.37). Figure 2 shows the confusion matrices for the eye level evaluation. Qualitative segmentation results are shown in Figures 3 and 4 for the Spectralis and ZEISS data, respectively. The figures illustrate scenarios of both high agreement between the model and the manual annotations, but also different types of disagreement, in the form of a false negative result and a false positive result. 
Table 2.
 
Performance for Eye Level iRORA and cRORA Classification
Table 2.
 
Performance for Eye Level iRORA and cRORA Classification
Figure 2.
 
Confusion matrices, comparing the manual reference with the AI model for the images originating from the (A) ZEISS and the (B) Spectralis datasets.
Figure 2.
 
Confusion matrices, comparing the manual reference with the AI model for the images originating from the (A) ZEISS and the (B) Spectralis datasets.
Figure 3.
 
Examples of the Heidelberg Spectralis results. Each subfigure shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) True positive result for cRORA. (B) False positive result example for cRORA. These cases exhibit a mixed phenotype with presence of sub-RPE drusen and subretinal drusenoid deposits as well as evidence of an overall on the B-scan increased choroidal hypertransmission, leading to ambiguity for the human reader when assessing the presence of iRORA/cRORA.
Figure 3.
 
Examples of the Heidelberg Spectralis results. Each subfigure shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) True positive result for cRORA. (B) False positive result example for cRORA. These cases exhibit a mixed phenotype with presence of sub-RPE drusen and subretinal drusenoid deposits as well as evidence of an overall on the B-scan increased choroidal hypertransmission, leading to ambiguity for the human reader when assessing the presence of iRORA/cRORA.
Figure 4.
 
An example of the ZEISS results. A table is shown in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first row on the left show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle row on the left shows the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows white horizontal lines, which correspond to the B-scans on the right. On these B-scans, the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, these B-scans also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. B-scan 91 contains a false positive result, where all three features were segmented by the AI model. In B-scan 118, the model and the AI model both detected the same iRORA lesion. B-scan 124 contains an iRORA lesion according to the manual reference. The model did find very small areas of ellipsoid loss and hypertransmission at this location, but RPE loss/attention was not segmented. Therefore, iRORA was not identified by the model.
Figure 4.
 
An example of the ZEISS results. A table is shown in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first row on the left show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle row on the left shows the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows white horizontal lines, which correspond to the B-scans on the right. On these B-scans, the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, these B-scans also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. B-scan 91 contains a false positive result, where all three features were segmented by the AI model. In B-scan 118, the model and the AI model both detected the same iRORA lesion. B-scan 124 contains an iRORA lesion according to the manual reference. The model did find very small areas of ellipsoid loss and hypertransmission at this location, but RPE loss/attention was not segmented. Therefore, iRORA was not identified by the model.
For most cases, OCT volumes from the ZEISS and Spectralis devices were available. The AI model and human grader interpreted these independently, blinded to the OCT from the other device. The discrepancies between the corresponding eye level outputs when interpreting the volumes from these two devices are visualized in Supplementary Figure S1. In Supplementary Figure S2, Bland-Altman plots to compare the predicted areas of the manual and AI segmentations of the iRORA and cRORA lesions are shown separately. An analysis of the manual annotations that were updated based on the discrepancies between the initial version of the annotations and the model predictions can be found in Supplementary Figure S3. An example of such an update is shown in Supplementary Figure S4
Consistency Between OCT Devices
The consistency of the eye level decisions made by the AI model and the human annotator across the two different OCT devices is presented using confusion matrices in Figure 5. The intra-rater agreement in terms of κw was 0.32 for the manual grader and 0.65 for the AI model. A qualitative comparison illustrating this type of consistency is shown in Figure 6
Figure 5.
 
Confusion matrices showing the intra-rater agreements of both (A) the manual grader and (B) the AI model when comparing their linked decisions on OCT-volume level for the ZEISS and the Spectralis data.
Figure 5.
 
Confusion matrices showing the intra-rater agreements of both (A) the manual grader and (B) the AI model when comparing their linked decisions on OCT-volume level for the ZEISS and the Spectralis data.
Figure 6.
 
Qualitative comparison of manual reference and AI model outputs between OCTs from the two devices. Each subfigure shows the ZEISS OCT first, followed by the Spectralis OCT, both taken from the same eye during the same visit. The B-scans presented below in each subfigure approximately correspond to the same location in the retina for both OCT devices. Each grid in the subfigures shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) An example where the outcomes are consistent between devices. Both the AI model and the manual reference agree on the presence of a cRORA lesion in the ZEISS and Spectralis OCT. (B) An example with a discrepancy between the manual reference for the two devices. The output of the AI model is more consistent regarding the detected iRORA and cRORA lesions in the OCTs from both devices. The manual reader likely missed the lesion in the ZEISS OCT due to human error, given the evident presence of all criteria for cRORA. A subanalysis of all ZEISS OCTs with a discrepancy between the manual and AI eye-level decisions, similar to what was done for the Spectralis OCTs in this study, would have likely led to the manual delineation of this lesion.
Figure 6.
 
Qualitative comparison of manual reference and AI model outputs between OCTs from the two devices. Each subfigure shows the ZEISS OCT first, followed by the Spectralis OCT, both taken from the same eye during the same visit. The B-scans presented below in each subfigure approximately correspond to the same location in the retina for both OCT devices. Each grid in the subfigures shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) An example where the outcomes are consistent between devices. Both the AI model and the manual reference agree on the presence of a cRORA lesion in the ZEISS and Spectralis OCT. (B) An example with a discrepancy between the manual reference for the two devices. The output of the AI model is more consistent regarding the detected iRORA and cRORA lesions in the OCTs from both devices. The manual reader likely missed the lesion in the ZEISS OCT due to human error, given the evident presence of all criteria for cRORA. A subanalysis of all ZEISS OCTs with a discrepancy between the manual and AI eye-level decisions, similar to what was done for the Spectralis OCTs in this study, would have likely led to the manual delineation of this lesion.
Effect of Domain Adaptation
The effect on κw (between the manual annotations and the AI model) and the confusion matrix of using DA in the iRORA/cRORA detection pipeline for the Spectralis data is shown in Figure 7. The κw score was 0.73 with DA, whereas it was 0.63 without DA. This difference was not statistically significant (P = 0.087). There were six cases with cRORA according to the manual reference which were classified as no iRORA/cRORA by the model without DA. This was reduced to two cases when including DA in the approach. However, the number of cases that the model classified as cRORA but were negative according to the manual reference did increase from one to three when introducing DA into the pipeline. An example where the model with DA picked up an iRORA lesion that the model without DA missed, is shown in Figure 8. The validation results of the DA model when used in a fluid segmentation pipeline on the RETOUCH test set are shown in Supplementary Table S1
Figure 7.
 
The effect of domain adaptation (DA) on eye level None/iRORA/cRORA classification in the Spectralis dataset. (A) The effect on the quadratic weighted kappa score (between the manual annotations and the AI model). The error bars are standard deviations from nonparametric bootstrapping with 1000 iterations. (B) Confusion matrix with DA. (C) Confusion matrix without DA.
Figure 7.
 
The effect of domain adaptation (DA) on eye level None/iRORA/cRORA classification in the Spectralis dataset. (A) The effect on the quadratic weighted kappa score (between the manual annotations and the AI model). The error bars are standard deviations from nonparametric bootstrapping with 1000 iterations. (B) Confusion matrix with DA. (C) Confusion matrix without DA.
Figure 8.
 
Example where the AI model with domain adaptation (DA) detected an iRORA lesion that was not picked up without DA.
Figure 8.
 
Example where the AI model with domain adaptation (DA) detected an iRORA lesion that was not picked up without DA.
Discussion
In this study, we presented a generalizable deep learning pipeline for the detection of iRORA/cRORA lesions in OCT. By using semantic segmentation of the underlying granular biomarkers, rather than direct detection of these lesions, we aimed to be resilient to possible definition changes of iRORA/cRORA. We used DA in the pipeline to ensure applicability to other devices beyond those used during training. We assessed the performance of this approach in iAMD eyes of the cross-sectional part of the MACUSTAR study, a prospective, multicenter, and low-interventional clinical study with data from multiple OCT devices per visit and patient, by comparing the model outputs to human annotations in a reading center setting. 
For both OCT devices, and the eye level classification with an onset of both iRORA and cRORA, the specificity of the AI model was substantially higher than the sensitivity. The Bland-Altman plots indicate almost no systematic bias, but show a number of outliers that are mostly due to the AI model segmenting larger areas than the human grader. Figure 4 illustrates this phenomenon, where both the human grader and the AI model found iRORA lesions, but the total en face area computed from the AI model segmentations was more than twice as large as the area computed from the manual annotations. 
Unambiguous manual delineation of lesions appeared particularly challenging in eyes with mixed drusen phenotypes of subretinal drusenoid deposits and sub-RPE drusen as well as evidence of an overall increased choroidal hypertransmission on the OCT line scan (see, for example, Fig. 3B). Additional en face assessment of choroidal hypertransmission can further help to delineate these lesions. 
The AI model performance was generally higher on the Spectralis data than on the ZEISS data. As can be observed in Table 2, the κw score was 0.73 for the Spectralis data, whereas it was 0.37 for the ZEISS data. This might be a consequence of the lower signal-to-noise ratio of the latter OCT device, causing the task of reliably detecting iRORA/cRORA lesions to be more challenging for both AI models and human graders. This can lead to a reference standard that is further away from the ground truth, causing performance metrics to be lower. Furthermore, the Spectralis results depicted in Table 2 were based on the manual labels that were updated based on the discrepancies between human and manual labels. However, the κw score computed when using the original manual labels was 0.53, which was still higher than for the ZEISS OCTs. The use of DA for the Spectralis data, which was not applied to the the ZEISS data, could also explain these differences, despite the high similarity between the ZEISS and training data (as described in the subsection “Domain Adaptation” of the “Methods” section). In future work, our assumption that DA was not necessary for the ZEISS data could be tested empirically. 
In the Spectralis dataset, the cases with a discrepancy between the AI model and initial manual labels on eye level were reconsidered manually. This subanalysis resulted in an updated label for 12 cases out of a total of 25 cases with iRORA/cRORA in the final manual labels. Eight out of these 12 cases were positive iRORA/cRORA cases that were initially a less severe classification. This indicates the necessity of AI-based workflows in future clinical settings and trials with iRORA/cRORA endpoints. Using AI to highlight missed cases or presenting AI findings before a human starts grading can potentially increase clinical accuracy. Therefore, we think initiating a discussion in the research and clinical community on how to incorporate AI systems in these settings is highly important. 
We evaluated the consistency of both the deep learning model and the human grader across different OCT devices. This intra-rater agreement was substantial for the AI model (κw = 0.65) and fair for the manual grader (κw = 0.32). Nevertheless, there still appears to be room for improvement in both cases, indicating the need for standardized OCT data. DA approaches, such as the one described in this work, may contribute toward this goal of data standardization. 
An increase in terms of κw from 0.63 to 0.73 was shown when introducing DA into the processing pipeline of the Spectralis data. However, this difference was not statistically significant (P = 0.087), which may be caused by the relatively small number of iRORA/cRORA cases in the evaluation set (n = 23 according to the manual labels, and n = 30 according to the deep learning model). 
Several previous studies have described the development and evaluation of algorithms for iRORA/cRORA classification or segmentation.1820 A direct performance comparison of our deep learning framework to these other algorithms is challenging due to the lack of benchmark datasets and standardized evaluation protocols. For example, Chiang et al.18 described a deep learning approach for iRORA/cRORA classification of individual B-scans. Their approach achieved higher sensitivities than ours, but our method demonstrated higher performance in terms of specificity, PPV, and NPV. However, a direct comparison of these classification metrics is problematic for several reasons. (1) The operating points used in their work and our work led to highly different trade-offs among sensitivity, specificity, PPV, and NPV. Therefore, it would be valuable to perform a comparison between these algorithms using receiver operating characteristic (ROC) curves in future work. We did not present ROC curves in this paper because our current algorithm does not provide probabilities for the detected iRORA/cRORA lesions, as this detection is based on the binary presence of the underlying features. (2) We reported the iRORA/cRORA (i.e. the 2 classes of iRORA and cRORA merged) and cRORA classification performance, whereas they reported the classification performance of iRORA and cRORA separately. (3) For some of their used datasets, they only reported metrics for classification on B-scan level instead of eye level. (4) There is a large difference in the type of populations used between the datasets used in their work and our work, leading to highly varying class imbalances and distributions of AMD stages. 
Another work that presented a deep learning model for iRORA/cRORA detection is described by Szeskin et al.19 They only reported metrics for column classification, further hampering a one-to-one comparison with the results of this study. Zhang et al.20 presented an automatic quantification approach for iRORA/cRORA as well, but they also did not provide metrics on eye level for iRORA/cRORA classification. 
An important strength of this study was the large evaluation dataset, which included high-quality imaging OCT data and was prospectively collected as part of a multicenter study. The OCT data from both acquisition devices used a highly dense OCT raster. This ensured that small lesions were captured in each scan, which is important in the quantification of lesions related to iAMD, such as iRORA. Furthermore, the human annotator was highly experienced with reading retinal images within a reading center setting with sophisticated training of readers. 
The wide applicability of our deep learning framework is another strength. Whereas other works for iRORA/cRORA detection in OCT tested their model performance on data from one specific device,18,19 we used a DA approach to allow the generalization to data from other devices, without the need for manually annotated training data from these devices. This generalization was tested on data from two main OCT devices, making our algorithm relevant for future multicenter studies where data from different imaging devices are available. The availability of reliable AI models transferable to various OCT devices can further help to identify patients at high risk for disease progression in need of more frequent follow-up visits in our daily routine as well as potential candidates for upcoming interventional study trials in iAMD. Moreover, the biomarker segmentation model,21 was trained on more AMD disease-related features than the ones currently used. This indicates the potential of our algorithm to be directly applied to other retinal OCT applications, such as fluid segmentation for patients with exudative AMD, ellipsoid zone loss detection for retinitis pigmentosa, or the segmentation of intraretinal cysts for X-linked retinoschisis. Without any further model development, our existing algorithm may already generalize well to OCT data with these pathologies from the various devices used in this work, due to the DA approach that we used. Future work could focus on further validating this broader applicability across various retinal conditions. 
This study has various limitations. (1) Whereas the MACUSTAR study consists of both a cross-sectional and longitudinal part, we only used the data from the cross-sectional part, as we did not have access to the longitudinal part yet.1 Future work should aim to incorporate this longitudinal data to evaluate our approach further. Furthermore, this work could be extended to predict disease progression with the use of AI. (2) The number of cases with iRORA/cRORA was relatively small. Specifically, based on manual annotations, there were 22 OCTs with iRORA/cRORA in the Spectralis subset and 11 in the ZEISS subset. In future analyses, more data should be included, including both the longitudinal and additional cross-sectional data beyond the iAMD subset from the MACUSTAR study, or even datasets from other studies. This may, for example, lead to better estimates of the performance estimates in general and statistically significant results when measuring the effect of including DA in the deep learning framework. (3) We used only a single modality and did not incorporate en face imaging from other modalities, which could have allowed for a more accurate detection of iRORA/cRORA. (4) RPE tear is an exclusion criterion for iRORA/cRORA,15 which was not taken into account by our deep learning model. (5) The DA approach we used in the current work only took differences related to the appearance of individual B-scans, such as the noise pattern, into account. However, there is also a lot of variability in the spacing between B-scans across OCT devices, and vertical registration strategies in the software of these devices. These differences may also influence biomarker quantification, especially given that we used a 3D segmentation model, but they were not addressed by the DA approach in this work. (6) In some cases, the optic disc fell in the 6 × 6 mm en face square we took around the fovea during pre-processing, which sometimes led to our model wrongly indicating iRORA/cRORA at the location of the optic disc. Future iterations of our model should incorporate a mechanism against this, for example, by introducing such samples in the training set. 
In conclusion, we present a robust deep learning algorithm to automatically detect iRORA/cRORA from SD-OCT. This algorithm is generalizable across various OCT devices, while manually annotated development data was only required from one of these devices. Applications of this model may enable precise automated tracking of AMD-related lesion changes in numbers, size, area, and retinal position. This is of major importance for monitoring iAMD disease progression, particularly given the increasing amount of multimodal retinal imaging data becoming available in studies. 
Acknowledgments
Supported by the FEMHabil Program, Faculty of Medicine, University of Bonn to M.S. This project has also received funding from the Innovative Medicines Initiative (IMI) 2 Joint Undertaking under grant agreement No. 116076. This Joint Undertaking receives support from the European Union's Horizon 2020 Research and Innovation Programme and the European Federation of Pharmaceutical Industries and Associations (EFPIA). Supported in part by an Unrestricted Grant from Research to Prevent Blindness, New York, NY, to the Department of Ophthalmology & Visual Sciences, University of Utah. 
MACUSTAR Consortium Members: H. Agostini, I. Aires, L. Altay, R. Atia, F. Bandello, P. G. Basile, J. Batuca, C. Behning, M. Belmouhand, M. Berger, A. Binns, C. J. F. Boon, M. Böttger, A. R. Branco, J. E. Brazier, C. Carapezzi, J. Carlton, A. Carneiro, A. Charil, R. Coimbra, D. Cosette, M. Cozzi, D. P. Crabb, J. Cunha-Vaz, C. Dahlke, H. Dunbar, R. P. Finger, E. Fletcher, M. Gutfleisch, F. Hartgers, B. Higgins, J. Hildebrandt, E. Höck, R. Hogg, F. G. Holz, C. B. Hoyng, A. Kilani, J. Krätzschmar, L. Kühlewein, M. Larsen, S. Leal, Y. T. E. Lechanteur, D. Lu, U. F. O. Luhmann, A. Lüning, N. Manivannan, I. Marques, C. Martinho, K. P. Moll, G. Montesano, Z. Mulyukov, M. Paques, B. Parodi, M. Parravano, S. Penas, T. Peters, T. Peto, S. Poor, S. Priglinger, R. Ramamirtham, D. Rowen, G. S. Rubin, J. Sahel, C. Sánchez, O. Sander, M. Saßmannshausen, M. Schmid, S. Schmitz-Valckenberg, J. Siedlecki, R. Silva, E. Souied, G. Staurenghi, J. Tavares, D. J. Taylor, J. H. Terheyden, A. Tufail, P. Valmaggia, M. Varano, A. Wolf, and N. Zakaria. 
Disclaimer: The communication reflects the authors' views. Neither IMI nor the European Union, EFPIA, or any associated partners are responsible for any use that may be made of the information contained therein. 
Disclosure: C. de Vente, Novartis (F); P. Valmaggia, Heidelberg Engineering (R), Bayer (R); C.B. Hoyng, None; F.G. Holz, Acucela (C, F), Allergan (F), Apellis (C, F), Bayer (C, F), Boehringer-Ingelheim (C), Bioeq/Formycon (F, C), CenterVue (F), Ellex (F), Roche/Genentech (C, F), Geuder (C, F), Graybug (C), Gyroscope (C), Heidelberg Engineering (C, F), IvericBio (C, F), Kanghong (C, F), LinBioscience (C), NightStarX (F), Novartis (C, F), Optos (F), Oxurion (C), Pixium Vision (C, F), Oxurion (C), Stealth BioTherapeutics (C), Zeiss (F, C); GRADE Reading Center (O); M.M. Islam, None; C.C.W. Klaver, None; C.J.F. Boon, None; S. Schmitz-Valckenberg, AlphaRET (C), Apellis (C, R), Bayer (F), Fomycon (C), Carl Zeiss MediTec (F), Galimedix (C), Heidelberg Engineering (F, R), Katairo (C), Kubota Vision (C), Novartis (C, F), Perceive Therapeutics (C), Pixium (C), Roche (C, F), SparingVision (C); A. Tufail, Annexon, Apellis, Astellas, Bayer, Astellas, Boehringer-Ingelheim, 4D Molecular Therapeutics, Graybug, Heidelberg Engineering, IvericBio/Astellas, Janssen, Novartis, Ocular Therapeutix, Oxurion Roche, Zeiss; M. Saßmannshausen, Heidelberg Engineering (F), Optos (F), Zeiss, CenterVue (F); C.I. Sánchez, Novartis (F), Novartis (R), Bayer (R) 
References
Finger RP, Schmitz-Valckenberg S, Schmid M, et al. MACUSTAR: development and clinical validation of functional, structural, and patient-reported endpoints in intermediate age-related macular degeneration. Ophthalmologica. 2019; 241(2): 61–72. [CrossRef] [PubMed]
de Sisternes L, Simon N, Tibshirani R, Leng T, Rubin DL. Quantitative SD-OCT imaging biomarkers as indicators of age-related macular degeneration progression. Invest Ophthalmol Vis Sci. 2014; 55(11): 7093–7103. [CrossRef] [PubMed]
Wu Z, Luu CD, Ayton LN, et al. Optical coherence tomography-defined changes preceding the development of drusen-associated atrophy in age-related macular degeneration. Ophthalmology. 2014; 121(12): 2415–2422. [CrossRef] [PubMed]
Fleckenstein M, Keenan TDL, Guymer RH, et al. Age-related macular degeneration. Nat Rev Dis Primers. 2021; 7(1): 31. [CrossRef] [PubMed]
Lad EM, Finger RP, Guymer R. Biomarkers for the progression of intermediate age-related macular degeneration. Ophthalmol Ther. 2023; 12(6): 2917–2941. [CrossRef] [PubMed]
Wu Z, Fletcher EL, Kumar H, Greferath U, Guymer RH. Reticular pseudodrusen: a critical phenotype in age-related macular degeneration. Prog Retin Eye Res. 2022; 88: 101017. [CrossRef] [PubMed]
Christenbury JG, Folgar FA, O'Connell RV, et al. Progression of intermediate age-related macular degeneration with proliferation and inner retinal migration of hyperreflective foci. Ophthalmology. 2013; 120(5): 1038–1045. [CrossRef] [PubMed]
Veerappan M, El-Hage-Sleiman AKM, Tai V, et al. Optical coherence tomography reflective drusen substructures predict progression to geographic atrophy in age-related macular degeneration. Ophthalmology. 2016; 123(12): 2554–2570. [CrossRef] [PubMed]
Thiele S, Nadal J, Pfau M, et al. Prognostic value of intermediate age-related macular degeneration phenotypes for geographic atrophy progression. Br J Ophthalmol. 2021; 105(2): 239–245. [CrossRef] [PubMed]
Hagag AM, Kaye R, Hoang V, et al. Systematic review of prognostic factors associated with progression to late age-related macular degeneration: pinnacle study report 2. Surv Ophthalmol. 2024; 69(2): 165–172. [CrossRef] [PubMed]
Nassisi M, Lei J, Abdelfattah NS, et al. OCT risk factors for development of late age-related macular degeneration in the fellow eyes of patients enrolled in the HARBOR study. Ophthalmology. 2019; 126(12): 1667–1674. [CrossRef] [PubMed]
Sutton J, Menten MJ, Riedl S, et al. Developing and validating a multivariable prediction model which predicts progression of intermediate to late age-related macular degeneration-the PINNACLE trial protocol. Eye. 2023; 37(6): 1275–1283. [CrossRef] [PubMed]
Lei J, Balasubramanian S, Abdelfattah NS, Nittala MG, Sadda SR. Proposal of a simple optical coherence tomography-based scoring system for progression of age-related macular degeneration. Graefes Arch Clin Exp Ophthalmol. 2017; 255(8): 1551–1558. [CrossRef] [PubMed]
Sadda SR, Guymer R, Holz FG, et al. Consensus definition for atrophy associated with age-related macular degeneration on OCT: classification of Atrophy Report 3. Ophthalmology. 2018; 125(4): 537–548. [CrossRef] [PubMed]
Guymer RH, Rosenfeld PJ, Curcio CA, et al. Incomplete retinal pigment epithelial and outer retinal atrophy in age-related macular degeneration: classification of Atrophy Meeting Report 4. Ophthalmology. 2020; 127(3): 394–409. [CrossRef] [PubMed]
Guymer RH, Wu Z, Gao S, et al. HONU: a multicenter, prospective, observational study of the progression of intermediate age-related macular degeneration. Invest Ophthalmol Vis Sci. 2023; 64: 2754, https://iovs.arvojournals.org/article.aspx?articleid=2789914.
Corradetti G, Corvi F, Nittala MG, et al. Natural history of incomplete retinal pigment epithelial and outer retinal atrophy in age-related macular degeneration. Can J Ophthalmol. 2021; 56(5): 325–334. [CrossRef] [PubMed]
Chiang JN, Corradetti G, Nittala MG, et al. Automated identification of incomplete and complete retinal epithelial pigment and outer retinal atrophy using machine learning. Ophthalmol Retina. 2023; 7(2): 118–126. [CrossRef] [PubMed]
Szeskin A, Yehuda R, Shmueli O, Levy J, Joskowicz L. A column-based deep learning method for the detection and quantification of atrophy associated with AMD in OCT scans. Med Image Anal. 2021; 72: 102130. [CrossRef] [PubMed]
Zhang G, Fu DJ, Liefers B, et al. Clinically relevant deep learning for detection and quantification of geographic atrophy from optical coherence tomography: a model development and external validation study. Lancet Digit Health. 2021; 3(10): e665–e675. [CrossRef] [PubMed]
Liefers B, Taylor P, Alsaedi A, et al. Quantification of key retinal features in early and late age-related macular degeneration using deep learning. Am J Ophthalmol. 2021; 226: 1–12. [CrossRef] [PubMed]
Swanson EA, Fujimoto JG. The ecosystem that powered the translation of OCT from fundamental research to clinical and commercial impact [Invited]. Biomed Opt Express. 2017; 8(3): 1638–1664. [CrossRef] [PubMed]
De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018; 24(9): 1342–1350. [CrossRef] [PubMed]
Romo-Bucheli D, Seeböck P, Orlando JI, et al. Reducing image variability across OCT devices with unsupervised unpaired learning for improved segmentation of retina. Biomed Opt Express. 2020; 11(1): 346–363. [CrossRef] [PubMed]
de Vente C, van Ginneken B, Hoyng CB, Klaver CCW, Sánchez CI. Uncertainty-aware multiple-instance learning for reliable classification: application to optical coherence tomography. arXiv [eessIV] Preprint. 2023, doi:10.48550/ARXIV.2302.03116.
Ovadia Y, Fertig E, Ren JJ, et al. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. Adv Neural Inf Process Syst. 2019; 32: 13969–13980.
Mårtensson G, Ferreira D, Granberg T, et al. The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study. Med Image Anal. 2020; 66: 101714. [CrossRef] [PubMed]
Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol. 2023; 96(1150): 20220878. [CrossRef] [PubMed]
Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018; 15(11): e1002683. [CrossRef] [PubMed]
Pooch EHP, Ballester P, Barros RC. Can we trust deep learning based diagnosis? The impact of domain shift in chest radiograph classification. In: Thoracic Image Analysis. Lecture Notes in Computer Science. 2020; 12502: 74–83.
Terheyden JH, Holz FG, Schmitz-Valckenberg S, et al. Clinical study protocol for a low-interventional study in intermediate age-related macular degeneration developing novel clinical endpoints for interventional clinical trials with a regulatory and patient access intention-MACUSTAR. Trials. 2020; 21(1): 659. [CrossRef] [PubMed]
Saßmannshausen M, Behning C, Weinz J, et al. Characteristics and spatial distribution of structural features in age-related macular degeneration: a MACUSTAR study report. Ophthalmol Retina. 2023; 7(5): 420–430. [CrossRef] [PubMed]
Ferris FL, 3rd, Wilkinson CP, Bird A, et al. Clinical classification of age-related macular degeneration. Ophthalmology. 2013; 120(4): 844–851. [CrossRef] [PubMed]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2012; 12: 2825–2830.
Bogunovic H, Venhuizen F, Klimscha S, et al. RETOUCH: the retinal OCT fluid detection and segmentation benchmark and challenge. IEEE Trans Med Imaging. 2019; 38(8): 1858–1874. [CrossRef] [PubMed]
Ikram MA, Brusselle GGO, Murad SD, et al. The Rotterdam Study: 2018 update on objectives, design and main results. Eur J Epidemiol. 2017; 32(9): 807–850. [CrossRef] [PubMed]
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Lecture Notes in Computer Science. Lecture notes in computer science. Cham, Switzerland: Springer International Publishing; 2015: 234–241.
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv Preprint. 2017. Available at: doi:10.48550/ARXIV.1703.10593.
Puzyeyeva O, Lam WC, Flanagan JG, et al. High-resolution optical coherence tomography retinal imaging: a case series illustrating potential and limitations. J Ophthalmol. 2011; 2011: 764183. [CrossRef] [PubMed]
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv Preprint. 2014. Available at: doi:10.48550/ARXIV.1412.6980.
Rutter CM. Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol. 2000; 7(6): 413–419. [CrossRef] [PubMed]
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021; 18(2): 203–211. [CrossRef] [PubMed]
Figure 1.
 
Overview of the methods used for the automated iRORA/cRORA detection pipeline. Spectralis data were fed into a domain adaptation model. The transformed OCTs were subsequently used as input for the Liefers et al.21 segmentation model. ZEISS OCTs were directly fed into the segmentation model. The OCT volumes were fully segmented for all segmented features, including iRORA/cRORA. In the CycleGAN training diagram, GS→T is the generator for transforming the appearance of Spectralis images to the appearance of Topcon OCTs. GT→S is the generator for transforming the appearance of Topcon images to the appearance of Spectralis OCTs. DT is the discriminator for the Topcon image domain. att., attenuation; HTR, hypertransmission; PD, photoreceptor degeneration.
Figure 1.
 
Overview of the methods used for the automated iRORA/cRORA detection pipeline. Spectralis data were fed into a domain adaptation model. The transformed OCTs were subsequently used as input for the Liefers et al.21 segmentation model. ZEISS OCTs were directly fed into the segmentation model. The OCT volumes were fully segmented for all segmented features, including iRORA/cRORA. In the CycleGAN training diagram, GS→T is the generator for transforming the appearance of Spectralis images to the appearance of Topcon OCTs. GT→S is the generator for transforming the appearance of Topcon images to the appearance of Spectralis OCTs. DT is the discriminator for the Topcon image domain. att., attenuation; HTR, hypertransmission; PD, photoreceptor degeneration.
Figure 2.
 
Confusion matrices, comparing the manual reference with the AI model for the images originating from the (A) ZEISS and the (B) Spectralis datasets.
Figure 2.
 
Confusion matrices, comparing the manual reference with the AI model for the images originating from the (A) ZEISS and the (B) Spectralis datasets.
Figure 3.
 
Examples of the Heidelberg Spectralis results. Each subfigure shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) True positive result for cRORA. (B) False positive result example for cRORA. These cases exhibit a mixed phenotype with presence of sub-RPE drusen and subretinal drusenoid deposits as well as evidence of an overall on the B-scan increased choroidal hypertransmission, leading to ambiguity for the human reader when assessing the presence of iRORA/cRORA.
Figure 3.
 
Examples of the Heidelberg Spectralis results. Each subfigure shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) True positive result for cRORA. (B) False positive result example for cRORA. These cases exhibit a mixed phenotype with presence of sub-RPE drusen and subretinal drusenoid deposits as well as evidence of an overall on the B-scan increased choroidal hypertransmission, leading to ambiguity for the human reader when assessing the presence of iRORA/cRORA.
Figure 4.
 
An example of the ZEISS results. A table is shown in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first row on the left show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle row on the left shows the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows white horizontal lines, which correspond to the B-scans on the right. On these B-scans, the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, these B-scans also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. B-scan 91 contains a false positive result, where all three features were segmented by the AI model. In B-scan 118, the model and the AI model both detected the same iRORA lesion. B-scan 124 contains an iRORA lesion according to the manual reference. The model did find very small areas of ellipsoid loss and hypertransmission at this location, but RPE loss/attention was not segmented. Therefore, iRORA was not identified by the model.
Figure 4.
 
An example of the ZEISS results. A table is shown in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first row on the left show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle row on the left shows the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows white horizontal lines, which correspond to the B-scans on the right. On these B-scans, the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, these B-scans also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. B-scan 91 contains a false positive result, where all three features were segmented by the AI model. In B-scan 118, the model and the AI model both detected the same iRORA lesion. B-scan 124 contains an iRORA lesion according to the manual reference. The model did find very small areas of ellipsoid loss and hypertransmission at this location, but RPE loss/attention was not segmented. Therefore, iRORA was not identified by the model.
Figure 5.
 
Confusion matrices showing the intra-rater agreements of both (A) the manual grader and (B) the AI model when comparing their linked decisions on OCT-volume level for the ZEISS and the Spectralis data.
Figure 5.
 
Confusion matrices showing the intra-rater agreements of both (A) the manual grader and (B) the AI model when comparing their linked decisions on OCT-volume level for the ZEISS and the Spectralis data.
Figure 6.
 
Qualitative comparison of manual reference and AI model outputs between OCTs from the two devices. Each subfigure shows the ZEISS OCT first, followed by the Spectralis OCT, both taken from the same eye during the same visit. The B-scans presented below in each subfigure approximately correspond to the same location in the retina for both OCT devices. Each grid in the subfigures shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) An example where the outcomes are consistent between devices. Both the AI model and the manual reference agree on the presence of a cRORA lesion in the ZEISS and Spectralis OCT. (B) An example with a discrepancy between the manual reference for the two devices. The output of the AI model is more consistent regarding the detected iRORA and cRORA lesions in the OCTs from both devices. The manual reader likely missed the lesion in the ZEISS OCT due to human error, given the evident presence of all criteria for cRORA. A subanalysis of all ZEISS OCTs with a discrepancy between the manual and AI eye-level decisions, similar to what was done for the Spectralis OCTs in this study, would have likely led to the manual delineation of this lesion.
Figure 6.
 
Qualitative comparison of manual reference and AI model outputs between OCTs from the two devices. Each subfigure shows the ZEISS OCT first, followed by the Spectralis OCT, both taken from the same eye during the same visit. The B-scans presented below in each subfigure approximately correspond to the same location in the retina for both OCT devices. Each grid in the subfigures shows a table in the top left, containing the eye-level decisions and the en face iRORA/cRORA areas, calculated from the segmentations, for both the manual reference and the AI model. The middle and right cells of the first rows show the iRORA/cRORA segmentations from the manual reference and the AI model, respectively, overlaid on the en face projection of the OCT volume. The middle rows show the segmentations from the AI model for the three underlying granular biomarkers of iRORA/cRORA, overlaid on the en face projection of the OCT volume. Each en face image shows a white horizontal line, which corresponds to the B-scan shown in the last row. The last rows show the AI model segmentation of the granular biomarkers. If iRORA/cRORA is detected, the last rows also show horizontal lines labeled as either “iRORA/cRORA (AI)” for the AI model or “iRORA/cRORA (manual)” for the manual reference. (A) An example where the outcomes are consistent between devices. Both the AI model and the manual reference agree on the presence of a cRORA lesion in the ZEISS and Spectralis OCT. (B) An example with a discrepancy between the manual reference for the two devices. The output of the AI model is more consistent regarding the detected iRORA and cRORA lesions in the OCTs from both devices. The manual reader likely missed the lesion in the ZEISS OCT due to human error, given the evident presence of all criteria for cRORA. A subanalysis of all ZEISS OCTs with a discrepancy between the manual and AI eye-level decisions, similar to what was done for the Spectralis OCTs in this study, would have likely led to the manual delineation of this lesion.
Figure 7.
 
The effect of domain adaptation (DA) on eye level None/iRORA/cRORA classification in the Spectralis dataset. (A) The effect on the quadratic weighted kappa score (between the manual annotations and the AI model). The error bars are standard deviations from nonparametric bootstrapping with 1000 iterations. (B) Confusion matrix with DA. (C) Confusion matrix without DA.
Figure 7.
 
The effect of domain adaptation (DA) on eye level None/iRORA/cRORA classification in the Spectralis dataset. (A) The effect on the quadratic weighted kappa score (between the manual annotations and the AI model). The error bars are standard deviations from nonparametric bootstrapping with 1000 iterations. (B) Confusion matrix with DA. (C) Confusion matrix without DA.
Figure 8.
 
Example where the AI model with domain adaptation (DA) detected an iRORA lesion that was not picked up without DA.
Figure 8.
 
Example where the AI model with domain adaptation (DA) detected an iRORA lesion that was not picked up without DA.
Table 1.
 
Overview of the Development Data of the Domain Adaptation Model
Table 1.
 
Overview of the Development Data of the Domain Adaptation Model
Table 2.
 
Performance for Eye Level iRORA and cRORA Classification
Table 2.
 
Performance for Eye Level iRORA and cRORA Classification
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×