February 2021
Volume 10, Issue 2
Open Access
Articles  |   February 2021
Genome-Wide Association Studies-Based Machine Learning for Prediction of Age-Related Macular Degeneration Risk
Author Affiliations & Notes
  • Qi Yan
    Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
    Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA
  • Yale Jiang
    Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA
    School of Medicine, Tsinghua University, Beijing, China
  • Heng Huang
    Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, PA, USA
    Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, PA, USA
  • Anand Swaroop
    Neurobiology Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
  • Emily Y. Chew
    Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
  • Daniel E. Weeks
    Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PA, USA
    Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
  • Wei Chen
    Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA
    Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PA, USA
    Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
  • Ying Ding
    Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
  • Correspondence: Ying Ding, 7133 Public Health, 130 DeSoto St, Pittsburgh, PA 15261, USA. e-mail: yingding@pitt.edu 
Translational Vision Science & Technology February 2021, Vol.10, 29. doi:https://doi.org/10.1167/tvst.10.2.29
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Qi Yan, Yale Jiang, Heng Huang, Anand Swaroop, Emily Y. Chew, Daniel E. Weeks, Wei Chen, Ying Ding; Genome-Wide Association Studies-Based Machine Learning for Prediction of Age-Related Macular Degeneration Risk. Trans. Vis. Sci. Tech. 2021;10(2):29. doi: https://doi.org/10.1167/tvst.10.2.29.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Because age-related macular degeneration (AMD) is a progressive disorder and advanced AMD is currently hard to cure, an accurate and informative prediction of a person's AMD risk using genetic information is desirable for early diagnosis and potential individualized clinical management. The objective of this study was to develop and validate novel prediction models for AMD risk using large genome-wide association studies datasets with different machine learning approaches.

Methods: Genotype data from 32,215 Caucasian individuals with age of ≥50 years from the International AMD Genomics Consortium in dbGaP were used to establish and test prediction models for AMD risk. Four different machine learning approaches—neural network, lasso regression, support vector machine, and random forest—were implemented. A standard logistic regression model using a genetic risk score was also considered.

Results: All machine learning–based methods achieved satisfactory performance for predicting advanced AMD cases (vs. normal controls) (area under the curve = 0.81–0.82, Brier score = 0.17–0.18 in a separate test dataset) and any stage AMD (vs. normal controls) (area under the curve = 0.78–0.79, Brier score = 0.18–0.20 in a separate test dataset). The prediction performance was further validated in an independent dataset of 783 subjects from UK Biobank (area under the curve = 0.67).

Conclusions: By applying multiple state-of-art machine learning approaches on large AMD genome-wide association studies datasets, the predictive models we established can provide an accurate estimation of an individual's AMD risk profile based on genetic information along with age. The online prediction interface is available at: https://yanq.shinyapps.io/no_vs_amd_NN/.

Translational Relevance: The accurate and individualized risk prediction model interface will greatly improve early diagnosis and enhance tailored clinical management of AMD.

Introduction
Age-related macular degeneration (AMD) is a multifactorial neurodegenerative disease and a leading cause of vision loss among the elderly in the developed countries.1,2 The disease affects the central vision and is progressive, starting with the appearance of drusen (i.e., the yellow or white deposits in the eye) and eventually leading to advanced AMD forms: wet AMD (choroidal neovascularization) and dry AMD (geographic atrophy).3 Patients can progress to one or both forms of advanced AMD. Some patients with early AMD maintain good vision for a long time without progressing to advanced AMD, whereas others quickly developed advanced AMD. 
In 2005, Fisher et al.4 reported that the CFH gene on chromosome 1 and ARMS2/HTRA1 genes on chromosome 10 were the most replicated gene regions associated with AMD. Later, with the advances of technology, multiple genome-wide association studies (GWAS) were conducted to examine the association between AMD and a genome-wide set of single nucleotide polymorphisms (SNPs). In 2016, the International AMD Genomics Consortium identified or confirmed a total of 34 loci with 52 independent genetic variants to be associated with advanced AMD risk.5 From this study, the phenotype and genotypes of 35,358 subjects were uploaded to dbGaP (phs001039.v1.p1) and the majority of them are Caucasians. Multiple studies demonstrated that the same AMD susceptibility loci were more strongly associated with AMD in Caucasians than in other ethnic groups.68 
Because advanced AMD is currently hard to cure, an accurate and informative prediction of a person's risk for advanced AMD at a young age using genetic information is desirable for early diagnosis, enhanced diet/behavior, and potential individualized clinical management. For example, for individuals with high predicted AMD risks, behaviors that could decrease AMD risk such as stopping smoking, keeping a healthy diet with more antioxidants, and taking appropriate vitamin supplements can be recommended. Earlier or more frequent clinical visits to monitor the development or progression of the disease can be also suggested to individuals with high AMD risks. In this study, our objective was to establish and validate prediction models for AMD risk based on genetic variants given any future age of a subject using the largest publicly available data for Caucasians. 
Methods
Sample Description and Genotype Data
The study subjects are from the International Age-Related Macular Degeneration Genomics Consortium – Exome Chip Experiment dbGaP dataset (phs001039.v1.p1), which gathered samples from 26 studies. There are 32,215 Caucasians among the total 35,358 subjects. Genotypes were imputed with the 1000 Genomes Project as the reference panel. A total of 13,503,037 genetic variants are included. The detailed subject recruitment, ascertainment of AMD severity and genotyping procedures have been reported elsewhere.5 
In addition, we extracted a set of 383 Caucasian subjects with macular degeneration (i.e., all AMD cases) and 400 randomly selected Caucasian controls >50 years old from the UK Biobank9 as an independent test dataset. The cases were determined by the self-reported macular degeneration code (Data-Field 20002, illness code 1528). Although we randomly selected 400 controls, these non–self-reported AMD subjects may still have macular degeneration (e.g., owing to missed reporting or the disease occurring after recruitment). The UK Biobank is the largest and most complete European Biobank available at present. 
Different Scenarios
We considered two main classification scenarios: (1) advanced AMD cases versus normal controls and (2) any AMD cases (i.e., both intermediate and advanced AMD) versus normal controls. In addition, we also considered another two binary outcome classification scenarios in the supplementary material: (3) intermediate AMD cases versus normal controls and (4) advanced AMD cases versus intermediate AMD cases. 
Feature SNPs Selection
First, we randomly divided our entire dbGaP data into a test dataset of 5000 samples and a training dataset of the remaining 27,215 samples. For each aforementioned classification scenario, we used the training dataset only to conduct the GWAS analysis to select feature SNPs as inputs for prediction models. The test dataset remains intact and was saved for the prediction performance evaluation. For all the classification scenarios, the GWAS was conducted using a logistic regression under an additive genetic model, adjusting for age, gender, and the first two principal components calculated based on genotypes (for controlling population stratification). We then selected genome-wide significant SNPs with a P value of <5 × 10−8 as the feature SNPs in two ways. In the primary list of feature SNPs, only the top one SNP (with the smallest P value) from each of the significant loci was selected. In a secondary list of feature SNPs, all genome-wide significant SNPs were selected. In another secondary SNP list, all SNPs with a P value of <1 × 10−5 were selected. We only considered SNPs with minor allele frequency of >0.01. In addition to the selected SNPs, we also included age as a predictor in the prediction model, because it is known to be associated with AMD risk and the model predicts the AMD risk at that given age. 
Machine Learning Methods
We considered four machine learning methods in this study: neural network (NN), lasso regression (Lasso), support vector machine (SVM), and random forest (RF). As a comparison, we also fitted a standard logistic regression using a genetic risk score (GRS), as described in the following paragraph. 
First, a multilayer feedforward NN was implemented using Keras.10 All layers were fully connected (Supplementary Fig. S1). We used two hidden layers with 16 nodes each and the L1 norm regularization with a tuning parameter lambda of 0.0001 at the input layer. Because a NN can learn complex relationships between predictors and outcomes, it might be expected to be superior to the ones based on a linear relationship (e.g., the standard logistic regression with a lasso penalty). A NN is often considered as a “black box” owing to its complex inner architecture. To better interpret the predictions, local interpretable model–agnostic explanations were applied, which perturbs the input of data samples and evaluates how the predictions change. A 10-fold cross-validation was performed within the training dataset to find the best epoch (i.e., iteration number) with the lowest loss, which was then used in the test dataset for evaluation. Second, a Lasso was implemented using the R function glmnet.11 Because different tuning parameter lambda values in NN and Lasso led to similar results, for the sake of simplicity, we used the same lambda value for NN and Lasso. Moreover, linear SVM and RF were implemented using the R package caret.12 Finally, we also computed a GRS: \({\rm{GRS}} = {\rm{\;}}\mathop \sum \nolimits_{i = 1}^p ({\beta _i}{G_i})/\mathop \sum \nolimits_{i = 1}^p {\beta _i},\;\) where βi is the log(odds ratio) of the risk variant i, obtained from our GWAS result for each classification scenario (similar approach was described in Ding et al.13) and Gi is the corresponding genotype (coded as 0, 1, and 2: copies of risk allele). Here p is the number of feature SNPs we selected, and the same set of SNPs were used in all four machine learning approaches. Note that in this coding all βi are positive and GRS ranges from 0 to 2. Then a standard logistic regression was fitted with this GRS and age as the predictors. We refer to this method as GRS. For the binary outcome classification, we calculated the area under the curve (AUC) of the receiver operator characteristic curves as the primary performance metric. The Brier score14 was used as a secondary metric, where a lower Brier score indicates a better prediction. Note that the useful benchmark value for the Brier score is 33%, which corresponds to predicting the risk by a random number drawn from a uniform [0, 1] distribution. Model performance was evaluated in the separate test datasets. 
Results
Study Data Characteristics
Detailed demographic and clinical characteristics of the entire dbGaP participants have been described elsewhere.5 In this study, the total sample size (Caucasians) was 32,215, the mean age was 73.8 ± 9.3 years, and women comprised 57.6% (n = 18,554) of the cohort (Table 1). Specifically, 14,348 were normal controls, 5290 were intermediate AMD cases, and 12,577 were advanced AMD cases, including 2644 geographic atrophy cases, 8430 choroidal neovascularization cases, and 1503 geographic atrophy/choroidal neovascularization mixed cases. As the AMD severity increased from no to intermediate to advanced AMD the mean age in those groups increased from 70.6 ± 9.5 to 74.7 ± 8.5 to 77.0 ± 8.0 years (Table 1 and Supplementary Fig. S2). The percentage of women among the intermediate or advanced AMD cases (59.2% and 58.9%) was higher than that in the normal controls (55.9%; Table 1). 
Table 1.
 
Characteristics Summary of the dbGaP Dataset
Table 1.
 
Characteristics Summary of the dbGaP Dataset
Feature SNPs Selection from GWAS of AMD
As shown in Figure 1 and Supplementary Figure S3, and Supplementary Table S1, the scenario 1 GWAS of advanced AMD cases versus normal controls resulted in the most number of genome-wide significant (P < 5 × 10−8) loci (18 loci [CFH, ADAMTS9-AS, COL8A1, CFI, C9, C2/CFB/SKIV2L, VEGFA, ARMS2/HTRA1, ACAD10, B3GALTL, LIPC, CETP, CTRB2/CTRB1, C3, APOE, C20orf85, SYN3/TIMP3, and SLC16A8] that include 5233 SNPs). All these loci were reported in Fritsch et al.,5 which also compared advanced AMD cases and normal controls. We did not capture all of the Fritsch et al.5 previously reported loci because we only used a subset of subjects in our GWAS and only analyzed common variants with minor allele frequency of >0.01. The scenario 2 GWAS of any AMD cases versus normal controls also identified many significant loci (16 loci with 5553 SNPs) and most of them were in the scenario 1 GWAS as well. However, TNFRSF10A from chromosome 8 and SMG6 from chromosome 17 were newly identified, which were not reported by Fritsch et al.,5 possibly because of the inclusion of intermediate AMD cases. The scenario 3 GWAS of intermediate AMD cases versus normal controls and the scenario 4 GWAS of advanced AMD cases versus intermediate AMD cases identified fewer significant loci (4 loci with 1583 SNPs, and 5 loci with 1228 SNPs, respectively), because the intermediate AMD category typically contains individuals with a wide range of disease severity, which can be close to either no or advanced AMD. The power could be another issue owing to a much smaller sample size of intermediate AMD cases. Although few loci were detected, the scenario 4 GWAS identified ABHD2 from chromosome 15, which was not reported by Fritsch et al.5 This gene could be useful for differentiating intermediate and advanced AMD. 
Figure 1.
 
Manhattan plots of P values and odds ratios (ORs) from GWAS. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls. The red horizontal line indicates the genome-wide significance threshold (P = 5 × 10−8). When original ORs are less than 1, new ORs equal to 1/ORs are shown in the plots.
Figure 1.
 
Manhattan plots of P values and odds ratios (ORs) from GWAS. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls. The red horizontal line indicates the genome-wide significance threshold (P = 5 × 10−8). When original ORs are less than 1, new ORs equal to 1/ORs are shown in the plots.
Prediction Performance
In our primary list of feature SNPs, we used the top one SNP from each of the genome-wide significant loci plus age as predictors. Five model approaches including NN, Lasso, SVM, RF, and GRS were performed for each scenario. Each model was trained in the training set and evaluated in the test set. The AUC values and Brier scores based on the test set are presented in Table 2. The receiver operator characteristic curves and 95% confidence interval (CI) of the AUC using the DeLong method15 were also reported in Figure 2 and Supplementary Figure S4. Scenario 1 showed overall good predictions (AUCs between 0.81 and 0.82 for all five approaches). For scenarios 3 and 4, all five prediction methods did not perform well (AUCs between 0.61 and 0.68). The reasons could be that a wide range of samples fell into the category of intermediate AMD, which could be close to either controls or advanced AMD cases. Scenarios 2 also showed reasonably good performance (AUCs of 0.78). The density curves of predicted risks were generated and shown in Figures 3 and Supplementary Figure S5. Such plots allow us to visually examine the two counterparts from each comparison scenario separately. Similar to the AUC results, the scenarios 1 and 2 showed clear separation. On the contrary, scenarios 3 and 4 led to ambiguous results. The individual feature importance heatmaps from local interpretable model–agnostic explanations (Fig. 4 and Supplementary Fig. S6) for NN further indicated that CFH and AMRS2/HTRA1 contributed the most to the predictions (marked with darker colors). Note that the green vertical lines indicate that the feature supports the predicted classification for that subject and the red vertical lines indicate that the feature contradicts the predicted classification (or equivalently, supports the counterpart of the predicted classification). Note that the local interpretable model–agnostic explanations heatmaps plotted the risk alleles of all SNPs, which are on the same scale (additive model, 0–2), but age is on a different scale (>50 years). Thus, although the color of age looks light, it is a very strong predictor. We further investigated the age effect on AMD risk by predicting a test dataset with age from 50 to 90 years and all SNPs with common homozygous genotypes. The results (Supplementary Fig. S7) showed that both advanced and any AMD risks increased as age advanced. 
Table 2.
 
AUC Values (95% CI) and Brier Scores (95% CI) of the Prediction of Scenario 1 (Normal Controls vs. Advanced AMD Cases) and Scenario 2 (Normal Controls vs. Any AMD Cases)
Table 2.
 
AUC Values (95% CI) and Brier Scores (95% CI) of the Prediction of Scenario 1 (Normal Controls vs. Advanced AMD Cases) and Scenario 2 (Normal Controls vs. Any AMD Cases)
Figure 2.
 
Receiver operator characteristic (ROC) curves of the predicted risk. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls.
Figure 2.
 
Receiver operator characteristic (ROC) curves of the predicted risk. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls.
Figure 3.
 
Density curves of the predicted risk for the two counterparts for five prediction methods. (A–E) advanced AMD cases versus normal controls, and (F–J) any AMD cases versus normal controls.
Figure 3.
 
Density curves of the predicted risk for the two counterparts for five prediction methods. (A–E) advanced AMD cases versus normal controls, and (F–J) any AMD cases versus normal controls.
Figure 4.
 
Feature importance heatmaps from local interpretable model–agnostic explanations for NN. (A) Scenario 1: normal controls versus advanced AMD cases. (B) Scenario 2: normal controls versus any AMD cases.
Figure 4.
 
Feature importance heatmaps from local interpretable model–agnostic explanations for NN. (A) Scenario 1: normal controls versus advanced AMD cases. (B) Scenario 2: normal controls versus any AMD cases.
In the secondary list of feature SNPs, we used all genome-wide significant SNPs and age as predictors. We applied NN and Lasso, but not SVM or RF, because they are not suitable for a large number of predictors. GRS was also excluded, because with all genome-wide significant SNPs, a larger number of less significant SNPs in linkage disequilibrium may contribute more to the prediction than a single very significant SNP, leading to a suboptimal GRS. On the contrary, NN and Lasso assigned penalties to the highly correlated features, which accounted for the correlations among SNPs in high linkage disequilibrium. Although the results were similar to the previous parsimonious models, NN showed slightly better AUCs than Lasso by an average of 0.01 (Supplementary Figs. S8 and S9; Supplementary Table S2). In another set of secondary list of feature SNPs with a P value of <1 × 10−5, the prediction accuracy did not improve in terms of AUCs (Supplementary Fig. S10). We also conducted NN models using the top SNPs from each of the significant loci as predictors without age to predict an individual's average AMD risk (across the lifetime), and the results showed moderate accuracy (Supplementary Fig. S11). For example, the AUC for predicting advanced AMD versus no AMD is 0.77 (95% CI, 0.75–0.78). Finally, we evaluated the performance in a non-Caucasian test dataset from the same dbGaP project to assess whether our training results from Caucasians could be applied to non-Caucasians. This non-Caucasian test dataset included a mixed population of Africans, Asians and subjects with unknown ancestry. The results (Supplementary Table S3) showed that the prediction is worse in non-Caucasians (e.g., AUC of 0.72–0.74 in NN) than in Caucasians (e.g., AUC of 0.82–0.83 in NN). 
In addition to the test dataset we generated from the dbGaP, we validated our prediction models on the 383 independent AMD subjects (of a mixture of AMD stages) and 400 random controls from the UK Biobank.9 All models produced similar results and we only present NN result using the top SNPs from each of the significant loci plus age as predictors here. The result showed moderate accuracy with an AUC of 0.67 (95% CI, 0.63–0.71). Moreover, when we excluded age from the model and only kept SNPs, the accuracy for predicting average lifetime AMD risk produced an AUC of 0.65 (95% CI, 0.61–0.69). 
We have implemented the established prediction model from the NN approach for the scenario of predicting any AMD (vs. normal control) using R Shiny, which is available at https://yanq.shinyapps.io/no_vs_amd_NN/. Note that the final predicted AMD risk output from this app is adjusted for population prevalence (see the Supplementary Text for details). 
Discussion
AMD is one of the most successful diseases for GWAS with multiple consistently replicated loci. The dbGaP (phs001039.v1.p1) dataset from the International AMD Genomics Consortium is the largest publicly available genotype dataset by far, with 35,358 subjects. Our results demonstrate that only using SNPs along with age could predict AMD risk accurately in Caucasians. 
We did not directly use the 52 SNPs from 34 reported loci from Fritsch et al.5 as predictors, because the use of these loci may lead to model overfitting; they were identified using the entire consortium data, which include both our training and test datasets. To select our feature SNPs for predictions, we conducted separate GWAS for four scenarios comparing among normal controls, intermediate AMD cases, and advanced AMD cases. To the best of our knowledge, these are the first large GWAS accounting for intermediate AMD. Most of the genome-wide significant SNPs from these four GWAS were identified from the previous large AMD GWAS.5 However, ABHD2 from chromosome 15 was identified for the first time in the comparison between advanced AMD and intermediate AMD; and TNFRSF10A from chromosome 8 and SMG6 from chromosome 17 were identified in the comparison between any AMD cases and normal controls. They were not observed previously, because only the comparison between advanced AMD and no AMD was studied before. We also conducted a fifth scenario comparing dry and wet AMD cases (results not shown). The SNPs from ARMS2/HTRA1 and MMP9 showed significantly different genetic effects between dry and wet AMD. However, these two genes were not able to classify dry and wet AMD. For the reference, we also used the 52 reported SNPs5 in our prediction models and the results showed slightly better prediction accuracy than our selected feature SNPs (Supplementary Fig. S12). This is likely due to the use of test data information in the training step, as we explained elsewhere in this article. 
All five model approaches provided similar prediction results. The prediction for advanced AMD versus normal controls had the best performance, which is not surprising, because they two are most distinguishable. However, the prediction for any AMD versus normal controls could be more clinically useful as it covers subjects with all possible AMD stages. The parsimonious models with only one top SNP from each significant locus achieved equivalent prediction performance compared to the models using all significant SNPs and thus are preferable in practical use. 
In this study, we considered five prediction methods: NN, Lasso, SVM, RF, and GRS. For the primary list of feature SNPs, they all achieved similar prediction accuracy. For the secondary list of feature SNPs (Supplementary Figs. S8 and S10), NN consistently had slightly higher AUCs than Lasso. One of the advantages that NN has as compared with Lasso is that NN accounts for nonlinear relationships and interactions among predictors in addition to the linear relationship that Lasso accounts for. The NN is equivalent to Lasso when only input and output layers are included with a L1 norm at the input layer (Supplementary Fig. S1). 
Age is an important predictor. A logistic regression showed that age alone could provide moderate accuracy for predicting AMD risk (Supplementary Fig. S13). Our predictive models established from all five approaches can predict a person's AMD risk at any future age >50 years old. 
Our study has some limitations. It could be more useful to predict the time-to-progression (to advanced AMD) risk instead of predicting the AMD risk at a given age, because the AMD status is dynamic and may change as time goes by. Another limitation is that primary test dataset from dbGaP are not completely independent from the training dataset since they all come from 26 studies. The secondary test dataset from UK Biobank (which is completely independent) is relatively small. In this study, NN does not show a clear advantage over the other competing approaches, which suggests that most popular prediction approaches can achieve satisfactory results for predicting AMD risk predictors so long as the feature SNPs are correctly identified. NN might be more advantageous than other approaches in the case with a large number of predictors and complex among predictor relationships. 
Data Availability
The AMD GWAS data for the model development were obtained from the publicly available repository dbGaP with accession number phs001039.v1.p1. The additional independent validation data were obtained from the UK Biobank. The interface of the established prediction model is freely available at https://yanq.shinyapps.io/no_vs_amd_NN/
Acknowledgments
The authors thank the International AMD Genomics Consortium for generating the genetic data, performing quality checks, and making the data available on dbGAP. This research was supported by the National Institutes of Health (R21EY030488 to Y.D., W.C.). 
Disclosure: Q. Yan, None; Y. Jiang, None; H. Huang, None; A. Swaroop, patent held by the University of Michigan (P); E.Y. Chew, None; D.E. Weeks, patent held by the University of Pittsburgh (P); W. Chen, patent held by the University of Michigan (P); Y. Ding, None 
References
Swaroop A, Chew EY, Rickman CB, Abecasis GR. Unraveling a multifactorial late-onset disease: from genetic susceptibility to disease mechanisms for age-related macular degeneration. Annu Rev Genom Human Genet. 2009; 10: 19–43. [CrossRef]
Congdon N, O'Colmain B, Klaver CC, et al. Eye Diseases Prevalence Research Group. Causes and prevalence of visual impairment among adults in the United States. Arch Ophthalmol. 2004; 122: 477–485. [CrossRef] [PubMed]
Age-Related Eye Disease Study Research Group. The Age-Related Eye Disease Study (AREDS): design implications. AREDS report no. 1. Control Clin Trials. 1999; 20: 573–600. [CrossRef] [PubMed]
Fisher SA, Abecasis GR, Yashar BM, et al. Meta-analysis of genome scans of age-related macular degeneration. Hum Mol Genet. 2005; 14: 2257–2264. [CrossRef] [PubMed]
Fritsche LG, Igl W, Bailey JN, et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat Genet. 2016; 48: 134–143. [CrossRef] [PubMed]
Kondo N, Bessho H, Honda S, Negi A. Complement factor H Y402H variant and risk of age-related macular degeneration in Asians: a systematic review and meta-analysis. Ophthalmology. 2011; 118: 339–344. [CrossRef] [PubMed]
Restrepo NA, Spencer KL, Goodloe R, et al. Genetic determinants of age-related macular degeneration in diverse populations from the PAGE study. Invest Ophthalmol Vis Sci. 2014; 55: 6839–6850. [CrossRef] [PubMed]
Spencer KL, Glenn K, Brown-Gentry K, Haines JL, Crawford DC. Population differences in genetic risk for age-related macular degeneration and implications for genetic testing. Arch Ophthalmol. 2012; 130: 116–117. [CrossRef] [PubMed]
Sudlow C, Gallacher J, Allen N, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015; 12: e1001779. [CrossRef] [PubMed]
Chollet F . Keras. GitHub repository, 2015. Available at: https://githubcom/keras-team/keras.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33: 1–22. [CrossRef] [PubMed]
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008; 28(5): 1–26, https://www.jstatsoft.org/rt/captureCite/v028i05/0/ApaCitationPlugin. [CrossRef] [PubMed]
Ding Y, Liu Y, Yan Q, et al. Bivariate Analysis of age-related macular degeneration progression using genetic risk scores. Genetics. 2017; 206: 119–133. [CrossRef] [PubMed]
Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999; 18: 2529–2545. [CrossRef] [PubMed]
DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44: 837–845. [CrossRef] [PubMed]
Figure 1.
 
Manhattan plots of P values and odds ratios (ORs) from GWAS. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls. The red horizontal line indicates the genome-wide significance threshold (P = 5 × 10−8). When original ORs are less than 1, new ORs equal to 1/ORs are shown in the plots.
Figure 1.
 
Manhattan plots of P values and odds ratios (ORs) from GWAS. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls. The red horizontal line indicates the genome-wide significance threshold (P = 5 × 10−8). When original ORs are less than 1, new ORs equal to 1/ORs are shown in the plots.
Figure 2.
 
Receiver operator characteristic (ROC) curves of the predicted risk. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls.
Figure 2.
 
Receiver operator characteristic (ROC) curves of the predicted risk. (A) Scenario 1: advanced AMD cases versus normal controls. (B) Scenario 2: any AMD cases versus normal controls.
Figure 3.
 
Density curves of the predicted risk for the two counterparts for five prediction methods. (A–E) advanced AMD cases versus normal controls, and (F–J) any AMD cases versus normal controls.
Figure 3.
 
Density curves of the predicted risk for the two counterparts for five prediction methods. (A–E) advanced AMD cases versus normal controls, and (F–J) any AMD cases versus normal controls.
Figure 4.
 
Feature importance heatmaps from local interpretable model–agnostic explanations for NN. (A) Scenario 1: normal controls versus advanced AMD cases. (B) Scenario 2: normal controls versus any AMD cases.
Figure 4.
 
Feature importance heatmaps from local interpretable model–agnostic explanations for NN. (A) Scenario 1: normal controls versus advanced AMD cases. (B) Scenario 2: normal controls versus any AMD cases.
Table 1.
 
Characteristics Summary of the dbGaP Dataset
Table 1.
 
Characteristics Summary of the dbGaP Dataset
Table 2.
 
AUC Values (95% CI) and Brier Scores (95% CI) of the Prediction of Scenario 1 (Normal Controls vs. Advanced AMD Cases) and Scenario 2 (Normal Controls vs. Any AMD Cases)
Table 2.
 
AUC Values (95% CI) and Brier Scores (95% CI) of the Prediction of Scenario 1 (Normal Controls vs. Advanced AMD Cases) and Scenario 2 (Normal Controls vs. Any AMD Cases)
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×