Open Access
Special Issue  |   April 2020
Artificial Intelligence to Stratify Severity of Age-Related Macular Degeneration (AMD) and Predict Risk of Progression to Late AMD
Author Affiliations & Notes
  • Alauddin Bhuiyan
    iHealthScreen Inc., New York, NY, USA
    New York University, New York, NY, USA
  • Tien Yin Wong
    Singapore Eye Research Institute, Singapore National Eye Center, Singapore
    DUKE-NUS School of Medicine, Singapore
  • Daniel Shu Wei Ting
    Singapore Eye Research Institute, Singapore National Eye Center, Singapore
    DUKE-NUS School of Medicine, Singapore
  • Arun Govindaiah
    iHealthScreen Inc., New York, NY, USA
  • Eric H. Souied
    Department of Ophthalmology, Hôpital Intercommunal de Créteil, Université, Creteil, France
  • R. Theodore Smith
    Icahn School of Medicine at Mount Sinai, New York, NY, USA
  • Correspondence: R. Theodore Smith, Department of Ophthalmology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. e-mail: rts1md@gmail.com 
Translational Vision Science & Technology April 2020, Vol.9, 25. doi:https://doi.org/10.1167/tvst.9.2.25
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Alauddin Bhuiyan, Tien Yin Wong, Daniel Shu Wei Ting, Arun Govindaiah, Eric H. Souied, R. Theodore Smith; Artificial Intelligence to Stratify Severity of Age-Related Macular Degeneration (AMD) and Predict Risk of Progression to Late AMD. Trans. Vis. Sci. Tech. 2020;9(2):25. doi: https://doi.org/10.1167/tvst.9.2.25.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To build and validate artificial intelligence (AI)-based models for AMD screening and for predicting late dry and wet AMD progression within 1 and 2 years.

Methods: The dataset of the Age-related Eye Disease Study (AREDS) was used to train and validate our prediction model. External validation was performed on the Nutritional AMD Treatment-2 (NAT-2) study.

First Step: An ensemble of deep learning screening methods was trained and validated on 116,875 color fundus photos from 4139 participants in the AREDS study to classify them as no, early, intermediate, or advanced AMD and further stratified them along the AREDS 12 level severity scale. Second step: the resulting AMD scores were combined with sociodemographic clinical data and other automatically extracted imaging data by a logistic model tree machine learning technique to predict risk for progression to late AMD within 1 or 2 years, with training and validation performed on 923 AREDS participants who progressed within 2 years, 901 who progressed within 1 year, and 2840 who did not progress within 2 years. For those found at risk of progression to late AMD, we further predicted the type (dry or wet) of the progression of late AMD.

Results: For identification of early/none vs. intermediate/late (i.e., referral level) AMD, we achieved 99.2% accuracy. The prediction model for a 2-year incident late AMD (any) achieved 86.36% accuracy, with 66.88% for late dry and 67.15% for late wet AMD. For the NAT-2 dataset, the 2-year late AMD prediction accuracy was 84%.

Conclusions: Validated color fundus photo-based models for AMD screening and risk prediction for late AMD are now ready for clinical testing and potential telemedical deployment.

Translational Relevance: Noninvasive, highly accurate, and fast AI methods to screen for referral level AMD and to predict late AMD progression offer significant potential improvements in our care of this prevalent blinding disease.

Introduction
Age-related macular degeneration (AMD) is the leading cause of vision loss in those older than age 50 years in the developed world.15 The number of people with AMD is expected to increase 1.5-fold over 10 years because of our aging population, hypertension, and other causes.6,7 At present, there is no treatment for late dry AMD (geographic atrophy).8,9 Although treatment with antivascular endothelial growth factor is often effective in maintaining or improving vision in the neovascular form of advanced AMD (i.e., wet AMD), it does not provide a cure. It is also often too late to mediate the issue by the time a person visits an ophthalmologist as the treatment cannot regenerate the vision.10,11 Further, such treatments are costly (typically costs about $9000–$65,000 for one eye, depending on which drug is used, for a 2-year course of treatment 12) and may be associated with significant cardiovascular risks 13 or even development of GA.14 Although the total (direct and indirect) cost of AMD is $225 billion per year15 and is expected to increase (AMD cases will be increased2), the indirect cost is even greater due to injury, depression, and social dependency resulting from blindness.16 
The Age-Related Eye Disease Study (AREDS), the largest AMD study, showed that specific antioxidants and vitamin supplementation (called AREDS supplements) reduce the risk of progression from intermediate-stage AMD to late AMD that can allow for preventive strategies.17 Approximately 7.6% of the United States population over the age of 60 is estimated to have advanced or intermediate AMD.18 Recent publications looking at the 10-year experiences of appropriately selected patients taking the AREDS formulation demonstrates that it is effective at slowing disease and improving visual acuity in approximately 25% of patients.19 For this reason, identification of people at risk for late AMD is very important because it could enable timely treatment such as photobiomodulation20 and laser intervention21) and AREDS supplements. Recent studies21 showed that sub-threshold laser (or sub-threshold nanosecond laser) reduces the progression of intermediate AMD except in the case of reticular pseudo-drusen (RPD), in which case it worsens. 
Motivated by this need, we reviewed the early screening of AMD and the prediction of AMD progression, which are a necessity to prevent late AMD. We found that: 
  • Systems have been built from existing datasets for automatic AMD screening/prediction, but none are ready for clinical deployment
  • No method includes RPD.22 RPD double the risk of progression to advanced wet AMD over soft drusen alone.22,23
  • No method has been proposed for telemedicine-based automated AMD screening in remote/underserved areas.
Our review found prediction models24,25 based on manual evaluations of drusen and pigment abnormalities that achieved 75.6% accuracy for 10-year-time (in contrast, our fully automated prediction model herein achieved 86.36% accuracy). AREDS report 826 showed on a population basis that for subjects aged 55 to 80 years followed 6.3 years, treatment with antioxidants plus zinc yielded a significant odds reduction for the development of advanced AMD compared with placebo. Genetic, ocular variables (manual analysis of fundus image), and sociodemographic parameter-based prediction of late AMD is reported in,27,28 and recently improved with additional genetic modeling. A number of AMD screening methods have been reported elsewhere,2932 which can only determine the disease status, not predict late AMD. For example, Grassmann et al.31 reported an ensemble deep learning-based classifier of 12 different AREDS categories based on pathology, but not a predictor. We have first proposed a fully automated late AMD prediction model, which was presented at ARVO 2018.33 Recently, Burlina et al. proposed a deep learning (DL)-based model34 for 5-year late AMD progression but did not demonstrate the late dry and late wet AMD prediction. However, in Burlina et al., one DL model essentially performs image classification by the AREDS nine-step severity scale, as in Grassman et al., and then relies on the published AREDS probabilities for progression at 5 years, rather than AI, to calculate progression risks. An alternate DL model, with regression directly from the image to risk prediction, as we propose here, had poorer overall performance than those that rely on the AREDS statistics. Our model is more complex and finely tuned than any of those, exploiting both DL for classification and machine learning for prediction as well as other retinal and demographic factors. In addition, we include in our training data abrupt transitions (early to late AMD in 1–2 years), and also predict late dry and wet AMD, which is unique. 
Here, we propose the first color fundus photo-based noninvasive screening and prediction model for late AMD for the 1- or 2-year incident with dry and wet form categorization. It is novel in many respects: different input sizes for neural network architectures for learning scale variant and invariant image features; a logistic model tree35 for building a final classifier after assembling different deep learning models, which is a new approach in retinal image classification also proposed by Grassmann et al; a single value risk of conversion produced from the 12-point AMD severity scale36 utilizing deep convolution neural networks. 
The proposed noninvasive technology will identify higher volumes of at-risk patients and determine whether an individual, including early AMD subjects, will develop late AMD in 1 to 2 years and should be referred to an ophthalmologist (Schematic for the overall screening and prediction of late AMD, Figure 1). AREDS Report 1737 mentioned, and we confirmed, that in AREDS, 36 subjects converted from early to late AMD within a year, and 50 within 2 years. Using the prediction score, the ophthalmologist can perform further testing and/or have higher confidence about immediate treatment (e.g., photobiomodulation20 or laser intervention 21) or advise more frequent follow-up visits. 
Methods
The model works in two steps. The first step is the screening or early diagnostic step (screening module). The second step is for predicting the individual's progression from intermediate to late AMD (prediction module). 
The Datasets
AREDS is a major clinical trial sponsored by the National Eye Institute. AREDS participants were 55 to 80 years old at enrollment, and they had to be free of any illness or condition that would make a long-term follow-up or compliance with study medications unlikely. Based on fundus photographs graded by a central reading center, the best-corrected visual acuity, and ophthalmologic evaluations, 4753 participants were enrolled in one of several AMD categories, including persons with no AMD. Subjects were randomly assigned to the vitamins and mineral supplements and placebo groups. A complete description of the AREDS dataset can be found in.1 Table 1 and Supplementary Table S14 show the number of subjects in each category and demographic group and the distribution of baseline characteristics. 
Table 1.
 
AREDS Data Distribution Within the Significant Parameters
Table 1.
 
AREDS Data Distribution Within the Significant Parameters
AREDS participants were assigned to four categories38 based on the size and extent of drusen and other AMD lesions: normal, early, intermediate, and advanced or late AMD. These assignments were made for the left and right eyes individually. 
Deidentified AREDS data was used in this study and was approved by the National Eye Institute Data Access Committee, National Institute of Health. 
The NAT-2 study4 was a randomized, placebo-controlled, double-blind, parallel, comparative study of 263 patients with early lesions of age-related maculopathy and visual acuity better than 0.4 LogMAR units in the study eye and neovascular AMD in the fellow eye. Patients were randomly assigned to receive either 840 mg/day DHA and 270 mg/day eicosapentaenoic acid from fish oil capsules or the placebo (olive oil capsules) for 3 years. Study evaluations included periodic color fundus photos and other retinal imaging. The NAT-2 dataset was used with the permission of the sponsor (Bausch and Lomb) and the principal investigator (Dr. Souied). These data were used for external evaluation only after the AI models were built and validated on the AREDS data. 
For screening, 116,875 color fundus images belonging to 4139 subjects classified as gradable in AREDS were selected. Each patient was assigned either to the training set (80% of the patients), validation set (12% of the patients), or testing set (8% of the patients), and all fundus images from each patient were included in the respective dataset. Thus, the training, validation, and testing data sets consisted of 93,380, 14,020, and 9475 fundus images, respectively. Because the training, validation, and testing groups were first separated by patient, this ensured that there was no mixing of photographs from the same patient between the photographic datasets. 
For prediction, we formed the dataset with images and demographic data from 901 subjects with incident late AMD in 1 year from baseline; 923 subjects with incident late AMD in 2 years from baseline and 2840 subjects without incident late AMD over the entire study. 
Training and validation data for the 2-year model consisted of 2634 subjects, 1988 nonconverted and 646 converted at 2 years; test data consisted of the other 852 nonconverted, and the other 277 converted at 2 years, a total of 1129 subjects. This gave a ratio of approximately 70:30 for training/validation to test data. We chose this ratio, higher than the usual 80:20, to provide an adequate size test set from the smaller number of converters. The fundus photos for each dataset consisted of all baseline photos for the subjects in each group. 
Training and validation data for the 1-year model consisted of 2619 subjects, 1988 nonconverted and 631 converted at 1 year. Test data consisted of the other 852 nonconverted and the other 270 converted at 1 year, a total of 1122 subjects, likewise also an approximate ratio of 70:30 for training/validation to test data. (Table 2). 
Table 2.
 
AREDS Dataset Organization for Late AMD Prediction in Module 2
Table 2.
 
AREDS Dataset Organization for Late AMD Prediction in Module 2
Our proposed late AMD prediction model works in two core modules: (A) the screening module and (2) the prediction module, which are summarized here. 
The Screening Module and AMD 12-Category Classification
Initially, the color channel transformation from RGB to CIE L*a*b is applied to generate perceptually uniform color space. The first module, preceded by a deep learning-based binary classifier that verifies image gradeability, is an ensemble, or combination, of deep learning AMD classifiers to define the exact stage of AMD present (no AMD, early AMD, intermediate AMD, and late AMD), which we developed earlier.30,32 This model is an ensemble of five networks of different input sizes of “Inception-V3” proposed by Szegedy et al.,39 “Inception-Resnet-V2” proposed by Szegedy et al.,40 and “Xception” proposed by Chollet 41 as described in our paper. We use this categorical information as an input to the prediction model. 
There is also an AREDS 12-class severity scale-based classifier, of which the first 9 classes are early to intermediate AMD, and the last three are late AMD. The first nine classes 36 confer a validated risk of AMD progression, ranging from about 2% for level 1 to about 50% for level 9 in 5 years, but in practice, it is too cumbersome for routine manual grading. Thus, our ensemble deep learning classifier now additionally assigns probabilities that an image falls within each of these 12 classes,36 and specifically for images without advanced AMD, the probabilities of where the image falls within the first nine classes36 are used in the prediction model. This system consists of an ensemble of six neural networks, each differing from the other with respect to the combination of input image size and the network architecture. The six networks are: Xception network with input size 499 × 499, Inception-Resnet-V2 network with size 399 × 399, Xception network with size 299 × 299, Inception-V3 network with input size 599 × 599, Inception-V3 with input size 399 × 399, and NasNet network (proposed by Zoph et al.42) with input size 399 × 399. Each network is trained to classify fundus images into 12 classes, giving an array of 12 probabilities, one for each class. These 12 classes are an amalgamation of 9-point AREDS-defined AMD severity levels and the three late AMD classes, dry, wet, and combined dry and wet AMD. This results in 72 probability values from six networks (6 × 12) that are used as the input to the prediction module. Supplementary Table S1 (online supplement) shows the number of color fundus photos with AREDS 12-class severity cases employed to train and validate this ensemble deep convolution neural network and logistic model tree with the approach described in.32 
The Prediction Module: A Logistic Model Tree
The second module is a machine learning algorithm that predicts if an individual with early or intermediate AMD will progress to late AMD within 1 or 2 years, based on the output of the first module and other imaging and demographic factors, In AREDS, 36 subjects with “early AMD” converted to late AMD within a year, and 50 more within 2 years. Therefore, we also used these early-stage AMD to build the prediction model. We built this model on images of individuals who did (923 for 2 years, 901 for 1 year), or did not (2840), convert (severity class 10, 11, and 12) in the next 1 or 2 years. Figure 3 shows fundus images of three subjects affected by intermediate AMD at baseline and who later converted to late AMD at an incident visit. Figure 3 also shows heatmaps of the retina of both visits generated by our classifier for the probability of AMD. The most active part of the image for the deep learning decision is mapped, and a schematic diagram is produced. The procedure of heatmap generation can be obtained elsewhere.43 
On developing the prediction model, we tested the Bayesian model,44 Support Vector Machines, decision tree, Random Forest,45 and Logistic Model Tree (LMT) for prediction accuracy, and found that the LMT 35 performed best. A logistic model tree is a classification model with an associated supervised training algorithm that combines logistic regression and decision tree learning.35 Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear regression models at its leaves to provide a piecewise linear regression model. The LMT was then augmented with other inputs. 
Drusen Quantification and RPD Inputs
We also separately and automatically quantified total drusen area and number larger than 125 µ as inputs to build the prediction model. The method, described elsewhere,46 showed 73% sensitivity and 99% specificity compared with three expert clinician manual segmentations. We applied the same ensemble model for categorizing RPD as none, questionable, or present, in and out of the 6000-µ circle, as defined by the International Classification and Grading System for AMD.47 Compared with image gradings by the AREDS study group, we achieved 96% accuracy in these RPD categorizations. We used this information along with the nine-class probability score generated based on the AREDS dataset to the final prediction model. 
AMD Category and Demographic Input
In addition to AREDS scale from the ensemble method32 and drusen information, we also included AMD category: none, early, intermediate, or late from our early DL classifier, and demographic factors such as age, race, sex, diabetes, body mass index, visual acuity, and sunlight exposure. A Cox proportional hazards model48 was used to obtain the importance of the sociodemographic parameters on the whole training set, and beta coefficients were taken for each parameter to select the nine significant ones: gender, age, smoking status, diabetes, body mass index, blood pressure, sunlight exposure, visual acuity, and AMD in the other eye. 
In summary, an ensemble of six deep convolution neural network models were each applied to generate 12-step AMD severity level probabilities (72 parameters). To these were added the drusen inputs, AMD category (four stages) and the nine significant demographic factors just reviewed. A three-class LMT model was built from these data to predict the conversion to late AMD in 1 or 2 years, or no conversion. Figure 2 shows the flow chart for late AMD prediction. 
Figure 1.
 
High-level flow chart for the overall screening and prediction of late AMD.
Figure 1.
 
High-level flow chart for the overall screening and prediction of late AMD.
Figure 2.
 
Flow chart for the late AMD prediction system. Input: The input parameters; module I: screening module (center) through several deep learning steps for none, early, and intermediate AMD; Module II: predict the progression to late AMD as well as late dry AMD or late wet AMD or no progression (in 1 or 2 years).
Figure 2.
 
Flow chart for the late AMD prediction system. Input: The input parameters; module I: screening module (center) through several deep learning steps for none, early, and intermediate AMD; Module II: predict the progression to late AMD as well as late dry AMD or late wet AMD or no progression (in 1 or 2 years).
Figure 3.
 
Fundus images of three subjects at baseline and late AMD incident visits, with heatmaps of AMD signs. Blue color, strong signs of AMD detected by our classifier. Green color, weaker signs of AMD. No signs of AMD were detected in the non-mapped portion of the images. Row A, baseline visit fundus photos. Row B, baseline heatmaps showing signs of early AMD. Row C, incident visit fundus photos showing late AMD. Row D, incident heatmaps showing much larger areas and signs of late AMD.
Figure 3.
 
Fundus images of three subjects at baseline and late AMD incident visits, with heatmaps of AMD signs. Blue color, strong signs of AMD detected by our classifier. Green color, weaker signs of AMD. No signs of AMD were detected in the non-mapped portion of the images. Row A, baseline visit fundus photos. Row B, baseline heatmaps showing signs of early AMD. Row C, incident visit fundus photos showing late AMD. Row D, incident heatmaps showing much larger areas and signs of late AMD.
The subjects’ treatment categories were also used to build the prediction models. In the AREDS study, there are four treatment categories: placebo, zinc, antioxidants, and zinc + antioxidants.38 We considered this information to categorize the subjects for the development and validation of the models. 
Ten-fold cross-validation method49 was applied for training the model, which is a standard method for the machine learning technique. Including RPD presence/absence did not yield any improvement in the prediction score, perhaps because there were only 27 subjects with RPD in our AREDS dataset, 20 subjects who progressed vs. seven who did not. 
Once we determine that someone is at risk of progressing to late AMD, the next step is to specify the probability of progression to late dry or wet AMD (Figure 4). 
Figure 4.
 
Extension of module II in Figure 2: the AMD prediction problem tackled as a two-stage problem, first establishing the risk of general late AMD and second the type of AMD progression.
Figure 4.
 
Extension of module II in Figure 2: the AMD prediction problem tackled as a two-stage problem, first establishing the risk of general late AMD and second the type of AMD progression.
Dry and Wet AMD Prediction
Prediction of progression to dry or wet AMD at 1 or 2 years required two additional models, one for each time period. Thus, once AMD progression (any type) was predicted by the first LMT at 1 or 2 years, an additional two-class logistic model tree was built for each time period, to predict dry or wet AMD, To build these models, we restricted the data to the known converters, both dry and wet, at 1 and 2 years. Converters who demonstrated both dry and wet advanced AMD at the point of conversion were omitted. For the 1-year model, we had 328 incident late wet AMD and 248 incident late dry AMD subjects for training and 140 incident late wet AMD and 107 incident late dry AMD for testing the model. For the 2-year model, we had 329 incident late wet AMD and 249 incident late dry AMD subjects for training and 140 incident late wet AMD and 107 incident late dry AMD for testing the model. Table 2 shows the details. The models for 1-year and 2-year converters were trained on exactly the same imaging and demographic input data as the main LMT, but now to identify risk for the two specific AMD types. This method of tackling the prediction problem in two stages is shown in Figure 4, wherein, the risks of converting to either form (dry or wet) of late AMD are separated in the second stage. 
We also modeled 5- and 10-year late AMD progression with 871 and 916 late AMD converters, respectively, and tested the models on 185 incident and 189 incident late AMD converters, respectively. We did not separate dry and wet cases. The datasets for each of the late AMD prediction models are organized as Table 2
For the prediction of late wet and dry AMD, we also experimented with deep learning models’ performance alone if we did not provide specific retinal features such as drusen area from the other modules to combine with DL output as input to the machine learning prediction. We used the same subjects with incident late AMD of both types in the training and testing as we used for the full prediction models. 
The 2-year progression model was then validated without further training on the NAT-2 external dataset. We used only patients in the placebo category to remove the effect of supplements in developing late-stage AMD.26 A total of 88 study eyes were taken for validation, of which 30 advanced to late AMD within 3 years (mean, 2.5 years) and 58 did not. 
Results
The initial results on DL screening were reported elsewhere.32 We have improved the accuracy significantly with the ensemble deep learning method, presented here. For binary screening of AMD stage, our algorithm achieved 99.2% (95% confidence interval [CI], 99.02–99.39) accuracy in distinguishing normal/early from intermediate/late AMD (sensitivity of 98.9% [95% CI, 98.64–99.66] and specificity of 99.5% [95% CI, 98.85–99.80]). To our knowledge, this is the best screening accuracy among such existing models (confusion matrices, Supplementary Tables S2 and S3, online supplement). The area under the curve for the model screening for intermediate and advanced AMD is 0.99 (95% CI, 0.989–0.991). AREDS published an inter-observer agreement on the human gradings of the severity of none, early, intermediate, and late, with a kappa value of 0.77.50 We have taken as ground truth the final AREDS gradings for these AMD stages, and our method showed agreement with these gradings at the kappa value 0.95 (Supplementary Table S3). 
The kappa score for the screening model is 0.983. A comparison with other published methods in accuracy, sensitivity, specificity, and kappa metrics is found in Table 3
Table 3.
 
Comparison of Accuracy, Sensitivity, Specificity, Kappa, and AUC of Existing vs. our AMD Screening Model Based on Referable/Nonreferable AMD Classification (2-class) and 4-class Accuracy of AMD Stage Classification (Normal, Early, Intermediate, and Advanced)
Table 3.
 
Comparison of Accuracy, Sensitivity, Specificity, Kappa, and AUC of Existing vs. our AMD Screening Model Based on Referable/Nonreferable AMD Classification (2-class) and 4-class Accuracy of AMD Stage Classification (Normal, Early, Intermediate, and Advanced)
For the incidence of late AMD in 2 years, we achieved 86.36% (95% CI, 84.22–88.31) accuracy, 92.42% (88.64–95.25) sensitivity, and 84.39% (81.78–86.76) specificity. For 1-year incidence of late AMD, we achieved 86.19% (84.03–88.15) prediction accuracy with 90.74% (86.64–93.92) sensitivity and 84.74% (82.15–87.09) specificity (confusion matrix in Supplementary Table S4, Supplementary Table S5, online supplement). 
For the prediction of late dry and wet AMD, we separated the test subjects based on the incident type. We achieved 66.88% (95% CI, 64.01–69.66) accuracy with 69.16% (59.50–77.73) sensitivity and 66.63% (63.60–69.56) specificity for the 2-year incidence of late dry AMD. We achieved 67.15% (64.29–69.93) accuracy with 71.43% (63.19–78.74) sensitivity, and 66.53% (63.44–69.51) specificity for 2-year incidence of late wet AMD (Supplementary Table S6 and Supplementary Table S7, online supplement). 
Table 4 shows the performance outcomes for all models. 
Table 4.
 
The Accuracy, Sensitivity, Specificity, and Precision of Dry and Wet AMD Prediction Models for the Prediction 1- and 2-Year Risk of Developing AMD
Table 4.
 
The Accuracy, Sensitivity, Specificity, and Precision of Dry and Wet AMD Prediction Models for the Prediction 1- and 2-Year Risk of Developing AMD
We achieved 66.79% (95% CI, 63.92–69.57) accuracy with 70.09% (60.48–78.5%) sensitivity, and 66.43% (63.40–69.37) specificity for the 1-year incidence of late dry AMD and 68.15% (65.31–70.90) accuracy with 73.57% (65.46–80.66) sensitivity, and 67.36% (64.29–70.32) specificity for the 1-year incidence of late wet AMD (confusion matrices, Supplementary Tables S8 and S9). 
We tested the results without the sociodemographic parameters. The proposed AMD prediction system with retinal and sociodemographic data outperformed all models based on retinal parameters only. The 2-year model, with only retinal parameters, performed worse in terms of sensitivity achieving 88.79% compared with 92.42% achieved by the proposed model (Supplementary Table S10). 
Deep learning models alone, without specific retinal imaging parameters, also resulted in poorer predictive performance compared with the elaborate multi-parametric approach this paper presents as inputs to the machine learning models. For late dry AMD, the sensitivity was 63% for 1 year and 65% for 2 years; the specificity was 69% for 1 year and 69% for 2 years. For late wet AMD, sensitivity was 59% for 1 year and 60% for 2 years; the specificity was 68% for 1 year and 70% for 2 years. (Supplementary Table S15, online supplement). 
We have measured the area under the curve for all six complete models for 1- and 2-year risk prediction of late AMD (any, dry, or wet) stratified by age, gender, and smoking status. All models performed better on females and nonsmokers. When stratified by age, the 1-year models all performed worse on subjects less than 60 compared with groups 60 to 70 and over 70, whereas the performance of the 2-year models on all three age groups was similar. Refer to Supplementary Figures S1-S6 for the receiver operating characteristic curves and explicit performance. 
The 5-year late AMD prediction, the model achieved 64.6% accuracy with 65.1% sensitivity and 64.1% specificity. The 10-year model achieved 62.03% accuracy with 62.9% sensitivity, and 60.8% specificity. The AREDS supplements and placebo categories had no effect on the accuracy of our prediction models of 1-, 2-, and 5-year incidence. We observed a slight improvement of approximately 1.33% in the accuracy (62.03% vs. 60.7%) in the 10-year prediction model by separating the treatment categories. 
Our 2-year late AMD model when further validated on NAT-2 data, produced an accuracy of 84% (95% CI, 74.75–91.02) with a sensitivity of 90% (95% CI, 73.47–97.89) and a specificity of 81% (95% CI, 68.59–90.13), the details of which are shown in Table 5 and Supplementary Table S11. The information in the NAT2 dataset is based on the incidence of late AMD 3 years between the first and the last visit. 
Table 5.
 
Sensitivity, Specificity, Accuracy, and Precision of the Prediction 2-Year Risk of Developing Late AMD (Dry or Wet) Validated on NAT-2 Dataset
Table 5.
 
Sensitivity, Specificity, Accuracy, and Precision of the Prediction 2-Year Risk of Developing Late AMD (Dry or Wet) Validated on NAT-2 Dataset
We also considered the NAT-2 dataset for late dry and late AMD prediction in 2 years. For late dry AMD, the number of subjects was six and for late wet AMD, 26, including two in common (details in Supplementary Table S12). For the prediction of incident late wet AMD in 2 years, we achieved 70.4% (95% CI, 52.21–88.43) sensitivity and 61.3% (95% CI, 48.07–73.40) specificity. The confusion matrix is shown in Supplementary Table S13. We did not have enough dry AMD subjects for validation in this NAT-2 dataset. 
Discussion and Conclusions
We have combined deep learning and machine learning, as well as AMD-specific image parameter generating algorithms, in a complex pipeline to create a portfolio of six automated color fundus photograph-based AMD prediction models. For each time period of 1 or 2 years, there are three models for prediction of late AMD: any, dry, or wet, respectively. To our knowledge, these are the first such models to predict conversion to late AMD successfully at either one or two years, with late dry and wet form categorization (the initial results 33 on late AMD prediction were presented at ARVO 2018). In addition to the general power of AI, we believe this success is also attributable to the careful selection of image preprocessing techniques and neural networks to ensure robustness to retina image and dataset variations, and the capability of learning features on such wide scales as tiny drusen or large areas of geographic atrophy. The first part of the pipeline is a DL image classifier, previously published and improved in this paper, which can discriminate none/early-stage from intermediate/late-stage AMD with an accuracy of 98%, to our knowledge the highest accuracy among reported methods. This is a critical component of the prediction model, but it clearly has independent value in identifying AMD in the general population for appropriate referral. This DL classifier has been refined here further with six algorithms, each providing probabilities of an image falling into any one of the 12 more detailed AREDS scales The outputs of these DL classifiers are then passed to the machine learning prediction model, together with AMD-specific image parameters and demographic factors, for finding individuals at risk of progression from intermediate to late AMD, who may require closer surveillance, better attention to modifiable risk factors, and who perhaps may wish to be considered for advanced therapies. 
The 2-year model has also been validated on the external dataset NAT-2 with results consistent with those from the AREDS training data. This suggests the potential for future portability to other imaging environments. We also experimented with the DL models’ performance alone if we did not provide specific retinal features such as drusen area from the other modules to combine with DL output as input to the machine learning prediction. Although this method proved to be inferior, it was useful to show the parametric importance of the specific severity and categories of AMD information in the image. This finding may be instructive in general with regard to methods that rely on DL exclusively. 
Stratification of subjects by gender, smoking status, and age yielded Interesting results. All six models performed better on females and nonsmokers. When stratified by age, the three one-year models all performed worse on subjects less than 60 compared with groups 60-70 and older than 70, whereas the performance of the three two-year models on all three age groups was similar. It would be of interest to AMD research to determine why certain groups’ outcomes were more predictable than others (e.g., why females were more predictable than males). 
The study has several limitations. With respect to deploying the models in the community or primary care clinics, other pathologies such as diabetic retinopathy besides AMD will be coexisting and confounding to algorithms tuned only to AMD, requiring further large investments in AI development before deployment. To our knowledge, all detection algorithms published so far are disease-specific, that is, trained on and therefore effective only on, data from a specific disease (e.g., diabetics to search for DR). Furthermore, instead of images from studies acquired by expert photographers on expensive fundus cameras, image acquisition platforms in the community will need to be inexpensive, almost completely automatic, and capable of adequate image quality for evaluation by AI. Although such systems are now becoming available, the proof of principle remains to be tested prospectively in large studies, with human validation, and with cost and reimbursement issues to be solved in a business model. These challenges, although beyond the scope of the present paper, must be acknowledged, and although difficult, will no doubt be surmountable by the ophthalmic and AI communities. 
Even high-quality photographs in eye specialists’ offices will be acquired on a range of devices, so that the transfer of our AMD prediction model built on AREDS photos is not a given. The same may be said for myriad other studies such as OCT studies of AMD and glaucoma performed on specific devices; there is likewise no guarantee that those findings transfer to other devices without extensive validation. Along this line, however, encouraging confirmation was obtained when our prediction algorithms performed well on the external, independent dataset from the NAT-2 study, with images acquired on other cameras. If similar success can be found in appropriate community-based trials, with the identification of subjects with AMD, then widespread adoption should soon follow. 
Finally, the prediction accuracy was significantly lower when we stratified the predictions to choroidal neovascularization (CNV) and geographic atrophy (GA). This may reflect a large number of nonincident cases compared to a very small number of pure dry and wet AMD cases for building the machine learning models. It will be of interest in future research to determine if these accuracies can be improved by access to larger datasets or other approaches. 
Strengths of the study include the immediate applicability of these techniques as an adjunct to AMD studies that employ color photography, where they could obviate major burdens of human effort for image classification. With the expected solution of the challenges to wider deployment just noted, a longer term and much broader benefit would be in the public health arena, where automated screening for AMD via telemedicine could bring needed care to millions before they were otherwise identified. Here AMD prediction would bring added value to patient management. 
In conclusion, both the screening and prediction models may prove to be public health assets through telemedicine. The prediction model could also be used in ophthalmology clinics to identify patients who require closer surveillance and better attention to modifiable risk factors, and who perhaps may wish to be considered for advanced therapies. Further validation in prospective trials will help determine the optimal utilization of these models for the prevention of blindness from AMD. 
The Software Tool ‘iPredict-AMD’: An online version of the prediction system is available at https://www.ihealthscreen.org/ipredict-amd/ (the user name is “ipredict-amd” and the password is “ipredict#test2019”). 
Acknowledgments
This project is funded by NIH SBIR project number R43EY026841. 
Disclosure: A. Bhuiyan, Image-Based Screening System for Prediction of Individual at Risk of Late-Age-Related Macular Degeneration (AMD) (P); T.Y. Wong, None; D.S.W. Ting, None; A. Govindaiah, None; E.H. Souied, None; R.T. Smith, None 
References
Age-Related-Eye-Disease-Study-Research-Group The Age-Related Eye Disease Study (AREDS): design implications. AREDS report no. 1. Control Clin Trials. 1999; 20: 573–600. [CrossRef] [PubMed]
AMD-Defined Age-related macular degeneration (AMD) . Projections for AMD (2010-2030-2050), National Eye Institute (NEI). Available at: https://nei.nih.gov/eyedata/amd/. Accessed on Aug 03, 2018.
Agurto C, Barriga E, Murray V, et al. Automatic detection of diabetic retinopathy and age-related macular degeneration in digital fundus images. Retina. 2011; 52: 5862–5871.
Bartlett H, Eperjesi F. Use of fundus imaging in quantification of age-related macular change. Survey Ophthalmol. 2007; 52: 655–671. [CrossRef]
Phan TV, Seoud L, Chakor H, Cheriet F. Automatic screening and grading of age-related macular degeneration from texture analysis of fundus images. J Ophthalmol. 2016;5893601. doi:10.1155/2016/5893601 [Epub Apr 14, 2016]
Wong TY, Liew G, Mitchell P. Clinical update: new treatments for age-related macular degeneration. Lancet. 2007; 370: 194–206.
WT Y., RS L.. Age related macular degeneration: time for a randomized controlled trial. Am J Ophthalmol. 2007; 144: 117–119. [CrossRef] [PubMed]
Lim LS, Mitchell P, Seddon JM, Holz FG, Wong TY. Age-related macular degeneration. Lancet. 2012; 379: 1728–1738. [CrossRef] [PubMed]
Ahlers C, Goetzinger E, Pircher M, et al. Imaging of the retinal pigment epithelium in age-related macular degeneration using polarization sensitive optical coherence tomography. Invest Ophthalmol Visual Sci. 2010; 51: 2149–2157. [CrossRef]
AMD.Org – Macular Degeneration Partnership. Available at: http://www.amd.org/what-is-macular-degeneration/dry-amd/. Accessed August 25, 2015.
Kanagasingam Y, Bhuiyan A, Abràmoff M, Smith R, Goldschmidt L, Wong T. Progress on retinal image analysis for age related macular degeneration. Prog Retin Eye Res. 2014; 38: 20–42. [CrossRef] [PubMed]
costHelper. Macular degeneration treatment cost. How much does macular degeneration treatment cost? Available at: https://healthcosthelpercom/macular-degenerationhtml. Accessed August 10, 2018.
Wong TY. Age-related macular degeneration and cardiovascular disease in the era of anti–vascular endothelial growth factor therapies. Am J Ophthalmol. 2009; 148: 327–329. [CrossRef] [PubMed]
Anti-VEGF treatment and dry AMD: finding the balance. 2015. Bright Focus Foundation. Available at: http://www.brightfocus.org/facts-data. Accessed August 25, 2015.
Malik U Age Related Macular Degeneration Factsheet. Available at: https://irisvision.com/agerelated-macular-degeneration-factsheet-interestingamd-facts/. Accessed August 10, 2018.
Schmier JK, Jones ML, Halpern MT. The burden of age-related macular degeneration. Pharmacoeconomics. 2006; 24: 319–334. [CrossRef] [PubMed]
Age-Related Eye Disease Study Research Group. The relationship of dietary carotenoid and vitamin A, E, and C intake with age-related macular degeneration in a case-control study: AREDS Report No. 22. Arch Ophthalmol. 2007; 125: 1225–1232. [CrossRef] [PubMed]
National-Eye-Institute. Prevalence of blindness data. Available at: http://www.nei.nih.gov/eyedata/pbd_tables.asp. Accessed May 30, 2013.
Chew EY, SanGiovanni JP, AREDS2-Research-Group. Lutein/zeaxanthin for the treatment of age-related cataractAREDS2 randomized trial report no. 4. JAMA Ophthalmol. 2013; 131: 843–850. [CrossRef] [PubMed]
Geneva I . Photobiomodulation for the treatment of retinal diseases: a review. Int J Ophthalmol. 2016; 9: 145–152. [PubMed]
Lek JJ, Brassington KH, Luu CD, et al. Subthreshold nanosecond laser intervention in intermediate age-related macular degeneration – study design and baseline characteristics of the laser in early stages of age-related macular degeneration study (Report Number 1). Ophthalmology. 2017; 1: 227–239. [PubMed]
Smith RT, Sohrab MA, Busuioc M, Barile G. Reticular macular disease. Am J Ophthalmol. 2009; 148: 733–743. [CrossRef] [PubMed]
Strahlman E, Fine S, Hillis A. The second eye of patients with senile macular degeneration. Arch Ophthalmol. 1983; 101: 1191–1193. [CrossRef] [PubMed]
Chiu C, Mitchell P, Klein R, et al. A risk score for the prediction of advanced age-related macular degeneration: development and validation in 2 prospective cohorts. Ophthalmology. 2014; 121: 1421–1427. [CrossRef] [PubMed]
Joachim N, Mitchell P, Burlutsky G, Kifley A, Wang JJ. The incidence and progression of age-related macular degeneration over 15 years: the Blue Mountains Eye Study. Ophthalmology. 2015; 122: 2482–2489. [CrossRef] [PubMed]
Group A-REDSR. A randomized, placebo-controlled, clinical trial of high-dose supplementation with vitamins C and E, beta carotene, and zinc for age-related macular degeneration and vision loss: AREDS report no. 8. Arch Ophthalmol. 2001; 119: 1417. [CrossRef] [PubMed]
JoM Seddon, R Reynolds, Yu Y, Daly MJ, Rosner B. Prediction risk modeling for progression to advanced age-related macular degeneration using baseline demographic, environmental, genetic and ocular variables. Invest Ophthalmol Visual Sci. 2011; 52: 5235.
Buitendijk GHS, Rochtchina E, Myers C , et al. Prediction of age-related macular degeneration in the general population: the Three Continent AMD Consortium. Ophthalmology. 2013; 120: 2644–2655 [CrossRef] [PubMed]
Burlina P, Pacheco KD, Joshi N, Freund DE, Bressler NM. Comparing humans and deep learning performance for grading AMD: A study in using universal deep features and transfer learning for automated AMD analysis. Computers Biol Med. 2017; 82: 80–86. [CrossRef]
Govindaiah A, Hussain A, Smith R, Bhuiyan A. Deep Convolutional Neural Network-Based Screening and Assessment of Age-Related Macular Degeneration from Fundus Images. In the Proceedings of IEEE International Symposium on Biomedical Imaging. 2017; 10.1109/ISBI.2018.8363863: 1525–1528.
Grassmann F, kamp JM, Brandl C, et al. A deep learning algorithm for prediction of age-related eye disease study severity scale for age-related macular degeneration from color fundus photography. Ophthalmology. 2018; 125: 1410–1420. [CrossRef] [PubMed]
Govindaiah A, Smith T, Bhuiyan A. A new and improved method for automated screening of age-related macular degeneration using ensemble deep neural networks. Proc IEEE EMBC 2018. 2018: 702–705.
Bhuiyan A, Govindaiah A, Smith RT. A prediction model for risk of progression to late age-related macular degeneration (AMD). Invest Ophthalmol Visual Sci. 2018; 59: 3214–3214.
Burlina PM, Joshi N, Pacheco KD, Freund DE, Kong J, Bressler NM. Use of deep learning for detailed severity characterization and estimation of 5-year risk among patients with age-related macular degeneration. JAMA Ophthalmol. 2018; 136: 1359–1366. [CrossRef] [PubMed]
Landwehr N, Hall M, Frank E. Logistic model trees. Machine Learning. 2005; 95: 161–205 [CrossRef]
Age-Related-Eye-Disease-Study-Research-Group. A Simplified severity scale for age-related macular degeneration, AREDS Report No. 18. Arch Ophthalmol. 2005; 123(11): 1570–1574. [CrossRef] [PubMed]
AREDS-Report-17. Age-Related Eye Disease Study Research Group. The age-related eye disease study severity scale for age-related macular degeneration: AREDS Report No. 17. Arch Ophthalmol. 2005; 123: 1484–1498. [CrossRef] [PubMed]
Age-Related-Eye-Disease-Study-Research-Group. A randomized, placebo-controlled, clinical trial of high-dose supplementation with vitamins C and E, beta carotene, and zinc for age-related macular degeneration and vision loss: AREDS report no. 8. Arch Ophthalmol. 2001; 119: 1417. [CrossRef] [PubMed]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition2016.
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-resnet and the impact of residual connections on learning. Paper presented at: In the Proceedings of Thirty-First AAAI Conference on Artificial Intelligence; 2017.
Chollet F . Xception: Deep learning with depthwise separable convolutions. Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017.
Zoph B, Vasudevan V, Shlens J, Le VQ. Learning transferable architectures for scalable image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. Available at: https://arxivorg/abs/170707012. Accessed August 1, 2018.
Class activation maps in Keras for visualizing where deep learning networks pay attention. Jacob's Computer Vision and Machine Learning blog. Available at: https://jacobgilgithubio/deeplearning/class-activation-maps. Accessed September 25, 2019).
Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999; 14(4): 382–401. [CrossRef]
Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms using different performance metrics. Proc ICML’06. Available at: http://wwwcscornelledu/∼alexn/papers/comparisontrpdf . Accessed July 1, 2018.
Hussain A, Govindaiah A, Souied E, Smith T, Bhuiyan A. Automated tracking and change detection of retinal pathology for age-related macular degeneration progression using fundus imaging. Proc IEEE ICIEV. 2018: 1–5.
Bird AEC, Bressler NM, Bressler SB. International classification of grading system for age-related maculopathy and age-related macular degeneration. Surv Ophthalmol. 1995; 39: 367–374. [CrossRef] [PubMed]
Fox J. Cox proportional-hazards regression for survival data. An R and S-PLUS Companion to Applied Regression. Thousand Oaks, CA, Sage Publications Inc2002; 2002.
Bishop CM. Pattern Recognition and Machine Learning. Vol ISBN 978-0387310732: New York, NY. Springer ; 2011.
Age-Related Eye Disease Study Research Group. The Age-Related Eye Disease Study system for classifying age-related macular degeneration from stereoscopic color fundus photographs: the Age-Related Eye Disease Study Report Number 6. Am J Ophthalmol, 2001; 132: 668–681. [CrossRef] [PubMed]
Figure 1.
 
High-level flow chart for the overall screening and prediction of late AMD.
Figure 1.
 
High-level flow chart for the overall screening and prediction of late AMD.
Figure 2.
 
Flow chart for the late AMD prediction system. Input: The input parameters; module I: screening module (center) through several deep learning steps for none, early, and intermediate AMD; Module II: predict the progression to late AMD as well as late dry AMD or late wet AMD or no progression (in 1 or 2 years).
Figure 2.
 
Flow chart for the late AMD prediction system. Input: The input parameters; module I: screening module (center) through several deep learning steps for none, early, and intermediate AMD; Module II: predict the progression to late AMD as well as late dry AMD or late wet AMD or no progression (in 1 or 2 years).
Figure 3.
 
Fundus images of three subjects at baseline and late AMD incident visits, with heatmaps of AMD signs. Blue color, strong signs of AMD detected by our classifier. Green color, weaker signs of AMD. No signs of AMD were detected in the non-mapped portion of the images. Row A, baseline visit fundus photos. Row B, baseline heatmaps showing signs of early AMD. Row C, incident visit fundus photos showing late AMD. Row D, incident heatmaps showing much larger areas and signs of late AMD.
Figure 3.
 
Fundus images of three subjects at baseline and late AMD incident visits, with heatmaps of AMD signs. Blue color, strong signs of AMD detected by our classifier. Green color, weaker signs of AMD. No signs of AMD were detected in the non-mapped portion of the images. Row A, baseline visit fundus photos. Row B, baseline heatmaps showing signs of early AMD. Row C, incident visit fundus photos showing late AMD. Row D, incident heatmaps showing much larger areas and signs of late AMD.
Figure 4.
 
Extension of module II in Figure 2: the AMD prediction problem tackled as a two-stage problem, first establishing the risk of general late AMD and second the type of AMD progression.
Figure 4.
 
Extension of module II in Figure 2: the AMD prediction problem tackled as a two-stage problem, first establishing the risk of general late AMD and second the type of AMD progression.
Table 1.
 
AREDS Data Distribution Within the Significant Parameters
Table 1.
 
AREDS Data Distribution Within the Significant Parameters
Table 2.
 
AREDS Dataset Organization for Late AMD Prediction in Module 2
Table 2.
 
AREDS Dataset Organization for Late AMD Prediction in Module 2
Table 3.
 
Comparison of Accuracy, Sensitivity, Specificity, Kappa, and AUC of Existing vs. our AMD Screening Model Based on Referable/Nonreferable AMD Classification (2-class) and 4-class Accuracy of AMD Stage Classification (Normal, Early, Intermediate, and Advanced)
Table 3.
 
Comparison of Accuracy, Sensitivity, Specificity, Kappa, and AUC of Existing vs. our AMD Screening Model Based on Referable/Nonreferable AMD Classification (2-class) and 4-class Accuracy of AMD Stage Classification (Normal, Early, Intermediate, and Advanced)
Table 4.
 
The Accuracy, Sensitivity, Specificity, and Precision of Dry and Wet AMD Prediction Models for the Prediction 1- and 2-Year Risk of Developing AMD
Table 4.
 
The Accuracy, Sensitivity, Specificity, and Precision of Dry and Wet AMD Prediction Models for the Prediction 1- and 2-Year Risk of Developing AMD
Table 5.
 
Sensitivity, Specificity, Accuracy, and Precision of the Prediction 2-Year Risk of Developing Late AMD (Dry or Wet) Validated on NAT-2 Dataset
Table 5.
 
Sensitivity, Specificity, Accuracy, and Precision of the Prediction 2-Year Risk of Developing Late AMD (Dry or Wet) Validated on NAT-2 Dataset
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×