June 2024
Volume 13, Issue 6
Open Access
Glaucoma  |   June 2024
Assessing the Efficacy of Synthetic Optic Disc Images for Detecting Glaucomatous Optic Neuropathy Using Deep Learning
Author Affiliations & Notes
  • Abadh K. Chaurasia
    Menzies Institute for Medical Research, University of Tasmania, Tasmania, Australia
  • Stuart MacGregor
    QIMR Berghofer Medical Research Institute, Brisbane, Australia
    School of Medicine, University of Queensland, Brisbane, Australia
  • Jamie E. Craig
    Department of Ophthalmology, Flinders University, Flinders Medical Centre, Bedford Park, Australia
  • David A. Mackey
    Lions Eye Institute, Centre for Ophthalmology and Visual Science, University of Western Australia, Perth, Australia
  • Alex W. Hewitt
    Menzies Institute for Medical Research, University of Tasmania, Tasmania, Australia
    Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Australia
  • Correspondence: Abadh K. Chaurasia, Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool St., Hobart, Tasmania 7005, Australia. e-mail: abadh.chaurasia@utas.edu.au 
Translational Vision Science & Technology June 2024, Vol.13, 1. doi:https://doi.org/10.1167/tvst.13.6.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Abadh K. Chaurasia, Stuart MacGregor, Jamie E. Craig, David A. Mackey, Alex W. Hewitt; Assessing the Efficacy of Synthetic Optic Disc Images for Detecting Glaucomatous Optic Neuropathy Using Deep Learning. Trans. Vis. Sci. Tech. 2024;13(6):1. https://doi.org/10.1167/tvst.13.6.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Deep learning architectures can automatically learn complex features and patterns associated with glaucomatous optic neuropathy (GON). However, developing robust algorithms requires a large number of data sets. We sought to train an adversarial model for generating high-quality optic disc images from a large, diverse data set and then assessed the performance of models on generated synthetic images for detecting GON.

Methods: A total of 17,060 (6874 glaucomatous and 10,186 healthy) fundus images were used to train deep convolutional generative adversarial networks (DCGANs) for synthesizing disc images for both classes. We then trained two models to detect GON, one solely on these synthetic images and another on a mixed data set (synthetic and real clinical images). Both the models were externally validated on a data set not used for training. The multiple classification metrics were evaluated with 95% confidence intervals. Models’ decision-making processes were assessed using gradient-weighted class activation mapping (Grad-CAM) techniques.

Results: Following receiver operating characteristic curve analysis, an optimal cup-to-disc ratio threshold for detecting GON from the training data was found to be 0.619. DCGANs generated high-quality synthetic disc images for healthy and glaucomatous eyes. When trained on a mixed data set, the model's area under the receiver operating characteristic curve attained 99.85% on internal validation and 86.45% on external validation. Grad-CAM saliency maps were primarily centered on the optic nerve head, indicating a more precise and clinically relevant attention area of the fundus image.

Conclusions: Although our model performed well on synthetic data, training on a mixed data set demonstrated better performance and generalization. Integrating synthetic and real clinical images can optimize the performance of a deep learning model in glaucoma detection.

Translational Relevance: Optimizing deep learning models for glaucoma detection through integrating DCGAN-generated synthetic and real-world clinical data can be improved and generalized in clinical practice.

Introduction
The timely detection and effective management of glaucomatous optic neuropathy (GON), which is characterized by the progressive degradation of the optic nerve head (ONH), are essential to prevent irreversible blindness and visual impairment.1,2 Deterioration of the ONH is one of the distinguishing features of glaucoma and can be documented through a fundus camera, the instrument used most frequently in glaucoma practice.3 However, substantial variability in individuals’ ONH morphology and clinicians’ ability to detect GON from the fundus image makes it challenging to discriminate between people with and without GON.4,5 Inconsistent glaucoma diagnosis may result from variations in fundus image quality, field of view, and examiner experience.68 Detecting and managing the disease early can significantly reduce the socioeconomic burden of the disease.9 These challenges highlight the critical need for cutting-edge techniques, artificial intelligence (AI), and deep learning (DL) to improve decision-making and detect glaucoma early and accurately. 
Advancements in DL technology, particularly convolutional neural networks (CNNs), have revolutionized the field of medical imaging analysis.10,11 CNN-based algorithms can automatically learn complex features and patterns associated with GON from fundus images: increased cup-to-disc ratio (CDR), neuroretinal rim thinning, disc hemorrhages, and retinal nerve fiber layer defects. The diagnostic performance of these algorithms is highly dependent on the quality and quantity of the input data set for the training.12 However, obtaining a diverse data set with accurate ground truth (constructed by clinicians: healthy and glaucoma) is difficult and limited by disease prevalence, privacy concerns, and the resource-intensive nature of manual grading of fundus images with or without glaucoma. CNN models tend to overfit when training on insufficient high-quality data with inaccurate annotation, limiting their generalizability.13,14 A novel technique for generating labeled data, such as generative adversarial networks (GANs),15 can be used to address these challenges by generating high-quality color fundus images for healthy and glaucomatous eyes.16,17 
Deep convolutional generative adversarial networks (DCGANs), a direct extension of original GANs, are efficient and stable architectures for generating synthetic images.18,19 DCGANs can generate synthetic high-quality images in the medical domain, including ophthalmology.18,2022 The application of DCGANs can efficiently generate synthetic fundus images with and without glaucoma, even with limited labeled data for training the generative models.2224 The challenges related to the limited availability of the glaucomatous fundus data set could potentially be overcome by utilizing DCGANs, resulting in a robust CNN-based architecture to improve glaucoma detection. In this study, we aimed to train DCGANs to generate high-quality optic disc images from a large, diverse data set. We then assessed the performance of DL models on generated synthetic disc images and mixed images (synthetic and real clinical) for GON detection. 
Methods
Publicly Accessible Data Sets
This study utilized fundus images of people with and without glaucoma from 20 different databases worldwide.25,26 The images were captured using various fundus cameras with different resolutions, as outlined in Supplementary Table S1. All these fundus images were preprocessed and reviewed from our previous study, including 6874 glaucomatous and 10,186 healthy discs,27 to ensure the quality and accuracy of the labels for both healthy and glaucomatous discs, which were then used to train our generative model. This study adhered to the principles of the Declaration of Helsinki, ensuring that all data were handled responsibly and with integrity and respected the terms and conditions set by the data providers. 
Determining CDR Threshold for Glaucoma Using a Pretrained AI Model
The CDR was quantified for all 17,060 preprocessed fundus images, using a previously developed pretrained regression model.28 We employed a receiver operating characteristic (ROC) curve analysis to assess the model's diagnosis ability.29 The true-positive rate and false-positive rate for a range of CDR thresholds were calculated, and the optimal CDR threshold for glaucoma detection was determined through Youden's index, which maximizes the sum of sensitivity and specificity.30 We also performed statistical comparisons between the healthy and glaucoma groups on the training and testing data sets using Gardner–Altman plots.31 
DCGAN Architecture
This study applied DCGANs to generate synthetic disc images for healthy and glaucoma eyes using the Pytorch framework.32 The DCGANs consist of two main components: a generator and a discriminator, architectural constraints to train generator and discriminator networks for unsupervised representation learning, handling training instability, and GAN collapse.18 The generator uses a series of transposed convolutional layers (ConvTranspose2d) to upscale a random noise vector (latent space) into a synthetic image. Each ConvTranspose2d layer was followed by a batch normalization layer to stabilize learning and a rectified linear unit (ReLU) activation function, except for the last layer, which uses a Tanh activation function.18 
The discriminator uses a series of convolutional layers (Conv2d) to downscale an input image into a single output prediction of whether the image is real or fake. Each Conv2d layer is followed by a dropout layer (to prevent overfitting), a batch normalization layer, and a LeakyReLU activation function, except for the last layer, which uses a Sigmoid activation function. The weights of the generator and the discriminator weights were initialized from a zero-centered normal distribution with a standard deviation of 0.02. Additionally, dropout layers were used after the first, second, and third convolutional layers to prevent overfitting, with a dropout rate of 0.6. 
DCGAN Training
The DCGANs were implemented using PyTorch,33 and the models were trained using the publicly accessible data sets of 17,060 fundus images. The hyperparameters such as the learning rate (0.0002), number of epochs (2000), batch size (64), β1 (0.05), and β2 (0.999) were selected to optimize the performance of the DCGANs.18 Binary cross-entropy loss and the Adam optimizer were applied. 
The DCGANs were trained using an iterative process in which the discriminator and generator were alternately trained. For each batch of real fundus images, the discriminator was trained to correctly classify real and fake (generated by the generator) images. Then, the generator was trained to fool the discriminator by generating images that the discriminator would classify as real. The losses of the generator and discriminator were recorded to measure the performance of the GANs. In addition, synthetic fundus images were periodically generated from a fixed batch of random noise vectors, visually indicating the generator's improvement over time. The performance of the model was evaluated and saved at every 500 iterations. Our DCGANs generated 25,000 synthetic disc images for both healthy and glaucomatous eyes, which were used to train CNN models for detecting GON, because training on vast synthetic data can enhance a model's generalizability.34 
Synthetic and Mixed Data Set Regimen
Our synthetic data set comprised 50,000 images, equally divided into “glaucoma” and “healthy” classes. For the train-synthetic regimen, we used 80% of these images (20,000 “glaucoma,” 20,000 “healthy”) for training purposes. The remaining 20% (5041 “glaucoma,” 4959 “healthy”) constituted the validation set. This distribution was designed to assess the efficacy of using entirely synthetic data for training the model in glaucoma detection. We combined synthetic and real images in the train-mixed regimen, resulting in a more diverse data set. This included the entire set of 50,000 synthesized images plus 6874 real “glaucoma” and 10,186 real “healthy” images. We then split this combined data set into 80% for training and 20% for validation. The training subset included real images of healthy and glaucomatous subjects, with 8149 images representing healthy subjects and 5499 images for glaucomatous subjects. The validation set consists of 13,411 images, with 1374 images for glaucomatous, 2037 images for healthy eyes from the real set, and 5000 synthetic images for each class. The train-mixed regimen aimed to investigate the model's performance on a data set. Class weights were used to overcome the issue of imbalance in the number of images between synthetic and mixed training groups. 
We split both real and synthetic data sets into an 80/20 ratio separately and then combined them for the mixed data set. To avoid any overlap between the training and testing sets, we used standard data visualization techniques for both sets before training the model using the show_batch function from the Fastai library. 
CNN Model for GON Detection
We selected the vgg19 architecture with batch normalization for detecting GON, inspired by the performance of vgg19_bn from our previous studies.27 This architecture includes a sequence of convolutional layers, activation layers employing ReLU, pooling layers, and fully connected layers. The vgg19_bn model was pretrained on ImageNet, a large-scale image database, to initialize the weights.35 This allowed the model to learn generic image features, and the fine-tuning and 1-cyclic policy was used to train the vgg19_bn model for detecting GON using the PyTorch framework, complemented by the Fastai library.36,37 
We trained two separate vgg19_bn models on two data sets: one with only synthetic images and one with a mix of synthetic and real clinical images. All the data were divided into training and validation sets of an 80/20 ratio. We validated the models’ robustness and generalizability for detecting GON on external real clinical data from Drishti_GS; it was not used for training the DCGAN and CNN models any time. 
Statistical Analysis
The publicly accessible preprocessed data were used for the ROC curve analysis to determine the optimal CDR threshold for diagnosing glaucoma.30 The performance of the DCGANs was evaluated using visual inspection of the generated disc images for both healthy and glaucoma, t-distributed stochastic neighbor embedding (t-SNE). Subsequently, the vgg19_bn models’ performance was assessed on 20% of the total data set. To determine the diagnostic ability for each discrete class, we adopted a one-versus-all approach for each classification metric. This enabled us to display the accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUROC), with corresponding 95% confidence intervals, for identifying a glaucomatous image versus all other images and a healthy image against all other images. Additionally, it was validated on the real-world Dristi data set to confirm the models’ generalizability. 
The experiment was conducted on a virtual Ubuntu desktop (version 22.04) using NVIDIA A100 with 40 GB of GPU RAM at Nectar Research Cloud.38 The following libraries were implemented: Python (version 3.10.6), PyTorch (version 2.0.0+cu117), Fastai (version 2.7.12), TorchVision (version 0.15.1+cu117), Matplotlib (version 3.5.1), and Scikit-learn (version 1.2.2).37,39,40 
Results
The “healthy” group had a lower CDR distribution, while the “glaucoma” group had higher ratios (P < 0.001) in both the training (global data) and the testing (Drishit_GS) sets. The mean differences between groups are depicted through an estimation plot in Supplementary Figure S1
Optimal CDR Threshold for Glaucoma
To investigate the glaucoma case detection performance for CDR, we undertook ROC curve analysis. In the training data set, the ROC curve demonstrated an area under the curve (AUC) of 0.86, indicating a high degree of discriminative ability between healthy individuals and those with glaucoma (Fig. 1a). The optimal CDR threshold, determined by the point on the curve that maximized the true-positive rate while minimizing the false-positive rate, was found to be 0.619. The distribution of healthy and glaucomatous disc images used to train the DCGANs for both classes, along with the optimal threshold for glaucoma, is displayed in Figure 1b, which was similar to the testing data set (Supplementary Fig. S2). 
Figure 1.
 
(a) The true-positive rate versus false-positive rate with different thresholds in the training data set. (b) CDR probability density for two groups, with a red dashed line marking the optimal CDR threshold for detecting glaucoma using ROC analysis.
Figure 1.
 
(a) The true-positive rate versus false-positive rate with different thresholds in the training data set. (b) CDR probability density for two groups, with a red dashed line marking the optimal CDR threshold for detecting glaucoma using ROC analysis.
Performance of DCGANs
Disc Qualitative Analysis
The DCGANs demonstrated a unique ability to generate synthetic disc images accurately for healthy and glaucoma classes, as shown in Figure 2. Synthetic images of healthy discs reflected the features present in real clinical images of healthy classes, while the images generated for the glaucomatous images caught the distinguishing sign of GON, an elongated cup-to-disc ratio. The quality of the images was evaluated by expert ophthalmologists who ensured the images exhibited characteristic features of healthy and glaucomatous discs, respectively. Additionally, we compared real and synthetic disc images for both classes using t-SNE plots to analyze high-dimensional properties and feature similarities,41 as shown in Figure 3. These results highlight the DCGANs’ effectiveness in producing high-quality synthetic images that can improve real-world data sets, thereby enriching the diversity of data for training our CNN model for detecting GON. 
Figure 2.
 
Comparative visualization of nonglaucomatous class (a) and glaucomatous class (b): real versus DCGAN-synthetic disc images.
Figure 2.
 
Comparative visualization of nonglaucomatous class (a) and glaucomatous class (b): real versus DCGAN-synthetic disc images.
Figure 3.
 
t-SNE plot displaying the disparity between real (healthy and glaucomatous: 10,186 and 6874) and synthetic images (25,000 of healthy and 25,000 of glaucomatous) of the ONH. In the four subplots, we contrast real and synthetic images, both healthy and with glaucoma. The bottom left and right subplots illustrate that some data points revealing the ONH clinical features in the real and synthetic disc images overlap.
Figure 3.
 
t-SNE plot displaying the disparity between real (healthy and glaucomatous: 10,186 and 6874) and synthetic images (25,000 of healthy and 25,000 of glaucomatous) of the ONH. In the four subplots, we contrast real and synthetic images, both healthy and with glaucoma. The bottom left and right subplots illustrate that some data points revealing the ONH clinical features in the real and synthetic disc images overlap.
Stability and Convergence of the DCGANs
Our DCGANs exhibited remarkable stability and reliable convergence over 2000 epochs for both the classes; a balanced performance between the discriminator and the generator was achieved, with discriminator loss of 0.6171 and 0.4231 and generator loss of 2.2825 and 3.6831 for healthy and glaucomatous discs, respectively. The training dynamics of our DCGANs for both healthy and glaucomatous disc images, depicting the evolution of the generator (G) and discriminator (D) losses over iterations, are shown in Supplementary Figure S3. Moreover, the discriminator's scoring metrics D(x) and D(G(z)) at the final epochs (healthy: D(x) = 0.6994, D(G(z)) = 0.2005; glaucomatous: D(x) = 0.8277, D(G(z)) = 0.0610) highlight the model's ability to correctly classify real and generated discs, with a low rate of false positives. These outcomes emphasize the potential of DCGANs in enhancing the training data set for our glaucoma detection model. 
Performance of CNN Models for Detecting GON
Performance on Synthetic Disc Images
Our vgg19_bn model exhibited excellent adaptability and robustness when exclusively trained on generated synthetic disc images for healthy (20,000) and glaucomatous (20,000) classes. The model achieved 99.98% accuracy across both classes on the validation set of 10,000 disc images; the classification metrics are shown in Table 1
Table 1.
 
Classification Performance Metrics for Detecting GON on Synthetic and Mixed Disc Images
Table 1.
 
Classification Performance Metrics for Detecting GON on Synthetic and Mixed Disc Images
Performance on Mixed Synthetic and Real Disc Images
The vgg19_bn model trained on a mixed data set of synthetic (40,000) and real disc images (13,648) exhibited remarkable performance on the validation set of 13,411. All the classification metrics achieved more than 0.9793 across all the metrics and classes (Table 1). Training the model on a diverse data set, combining synthetic and real clinical disc images, establishes the effectiveness of our model for detecting GON accurately. The ROC curve found an AUC of 0.9985, indicating strong model discrimination between healthy and glaucomatous discs. However, the confusion matrix revealed slight difficulty with false negatives and false positives, as illustrated in Supplementary Figure S4
External Validation
Our models demonstrated exceptional AUROC when exclusively trained on synthetic data, achieving 100% during internal validation. However, the model's AUROC significantly dropped by 27.05% on unseen real clinical images; these data were not previously used in training for both DCGAN and CNN models. This indicates the model learned the highly specialized features from the synthetic images, potentially leading to overfitting. Alternatively, when trained on a mixed data set, the model's AUROC reached 99.85% on internal validation. The model's AUROC was improved significantly by a yield of 13.5 (Table 2) when validated on external data, improving the model's generalizability and performance on real-world clinical data. 
Table 2.
 
Models’ Diagnostic Performance on a Clinical Data Set from Drishti_GS to Detect GON during External Validation
Table 2.
 
Models’ Diagnostic Performance on a Clinical Data Set from Drishti_GS to Detect GON during External Validation
Gradient-Weighted Class Activation Mapping
We evaluated our vgg19_bn models’ decision-making processes using gradient-weighted class activation mapping (Grad-CAM) techniques to illustrate differences based on the nature of the training data for classifying healthy and glaucomatous discs.42 Particularly, when the model was trained purely on synthetic images, the resulting Grad-CAM saliency maps emphasized extensive regions across the entire fundus (diffused saliency), suggesting a less focused pattern of attention (Supplementary Fig. S5). In contrast, training on a mix of synthetic and real disc images fine-tuned the model's attention: the saliency maps became more immersed around the ONH, indicating a more precise and clinically relevant attention area of the fundus image (Supplementary Fig. S6). This highlights the importance of real training data in executing the model to detect GON from fundus images by concentrating on diagnostically meaningful locations. 
Discussion
In this study, we aimed to train DCGANs for generating high-quality optic disc images and, subsequently, accessed the performance of the CNN-based models when trained on synthetic images and mixed data (synthetic and real clinical images) separately for detecting GON. The application of DCGANs has been utilized for synthesizing fundus images with and without a glaucomatous eye from limited labeled data.2224 However, training these models can be challenging with a small number of images, potentially impacting the image quality.43 To the best of our knowledge, this is the first study that used multiethnic data (19 publicly accessible data sets) to train DCGANs and generate a large and diverse set of synthetic disc images of healthy and glaucomatous eyes. We found that training DCGANs on a diverse data set allowed us to overcome the problem of limited data availability and find ways to improve the CNN model's performance and applicability. 
Our DCGANs demonstrated remarkable efficacy, producing unlimited synthetic disc images of profound quality, effectively capturing the clinical features of healthy and glaucomatous discs (Fig. 2). We also used the conditional GAN for disc synthesis, but the generated image quality was inferior compared to the outcomes of DCGANs (Supplementary Fig. S7). Our research confirmed the findings of Diaz-Pinto et al.,22 which verified the DCGANs model's ability to generate high-quality disc images, further reinforcing its potential superiority to Costa's method. We used t-SNE plots (Fig. 3) to visually evaluate the performance of our DCGANs in distinguishing between real and synthetic images based on clinical features. The high quality of the generated discs demonstrates that DCGANs can learn GON features from the original data sets and generate new, realistic disc images for training a CNN model. 
Our CNN model achieved an accuracy of 99.98% in differentiating between healthy and glaucomatous discs when trained exclusively on synthetic images. This may seem like an exceptional quality of the training data set, but we noticed some inconsistencies in image quality (inferior quality and artifacts). The model may have learned specific patterns or artifacts inherent to the images generated by DCGANs rather than GON clinical features; as disclosed by the t-SNE plot in Figure 3, the synthetic healthy and glaucomatous disc features overlap despite the model's performance generally being outstanding. Consequently, the model's accuracy dropped considerably to 71.19% on the external validation set, emphasizing the potential overfitting of our model to the synthetic data set. However, the model demonstrated a sensitivity of 80% for GON detection, indicating a robust capability to identify positive cases accurately. Although the model by Zheng et al.44 was trained solely on synthetic optical coherence tomography (OCT) data, it achieved a sensitivity of 67% when detecting retinal diseases on real data. The performance of our model decreased on external data, which also aligns with the finding by Kumar et al.34 using OCT images for glaucoma detection. Although a CNN model can achieve exceptional results when trained on synthetic images, these results may not necessarily translate into real clinical circumstances, and relying solely on synthetic data can be misleading. 
The performance of the CNN model, when trained on mixed (real and synthetic) disc images generated by DCGANs, achieved remarkable performance metrics across both classes, exceeding the threshold of 0.9793. This robust performance indicates that including mixed data in the training set left the model's performance unaffected in identifying the image with or without glaucoma. This highlights the importance of using DCGAN-based synthetic images to improve training data sets, particularly in domains with limited access to real-world data. However, the model demonstrated an impressive 86.14% accuracy on completely unseen data. When we trained and validated the model on the same set of real clinical data from our previous study,27 it exhibited a parallel accuracy of 87.13%. Interestingly, this was not statistically significantly different between the mixed and real training data sets (Kruskal–Wallis test; P = 0.32). These results collectively signify the value of real images in model training; they consistently enhance the model's accuracy and generalizability for determining the GON. When the AUROCs of these models (trained on synthetic data, mixed data, and real data and tested on unseen data) were compared, there was no statistically significant difference (Kruskal–Wallis test; P = 0.37). This implies that the DCGAN model can generate optic disc images that could be beneficial in developing AI-based models for glaucoma care. However, DCGAN-based synthetic images offer a viable alternative, but real clinical images remain paramount in optimizing the model's performance for diagnosing glaucoma. 
We utilized Grad-CAM to visualize and interpret the decision-making areas of images generated by our models. When we trained our model on synthetic data, the Grad-CAM heatmaps reflected a diffuse pattern, indicating a response to nonspecific features that might not align with clinically significant regions (Supplementary Fig. S5). Grad-CAM visualizations also included color bars to show the intensity of saliency, with higher pixel values for more critical areas. Conversely, training on a mixed data set, the Grad-CAM visualizations prominently highlighted the ONH, a pivotal region in the clinical diagnosis and management of glaucoma (Supplementary Fig. S6). Although synthetic data can enhance data sets and improve model robustness, real-world clinical images are still required to ensure the model's diagnostic performance. 
Our study offers valuable insights, but it also has some limitations. The DCGANs were trained on publicly accessible glaucomatous data sets, which might have inconsistencies in labeling between healthy and glaucomatous fundus images, potentially impacting the model's performance for detecting GON. Most data sets did not include data related to glaucoma severity, making it difficult to conduct a detailed model that accounts for different stages of the disease. We exclusively utilized ONH images, excluding valuable information from full fundus images, retinal nerve fiber layer defects. Additionally, our models were trained on images downsized to 224 × 224 pixels, which could obscure important clinical information regarding the GON. We also used two separate DCGANs to generate healthy and glaucoma discs that may have unintentionally created distinct patterns for each class, making it easy for the CNN model. Moreover, we externally validated our model only on the Drishti_GS data set. Lastly, synthetic data demonstrated excellent performance of our model, but they may not capture all the clinical features of both healthy and glaucomatous eyes, as evident in our diffuse Grad-CAM heatmaps. The Grad-CAM's heatmaps depend on the model's architecture and selected layers (the last convolution layer), limiting its ability to provide a full understanding of the decision-making process. In future studies, a robust GON detection model can be developed by using higher-resolution full fundus images from diverse populations and selecting gold standard ground truth based on multiple clinical tests by experienced clinicians. 
In conclusion, DCGANs can generate infinite labeled high-quality synthetic disc images for healthy and glaucomatous eyes. Our DL model performed exceptionally well when trained and validated solely on synthetic data, but it struggled to attain acceptable accuracy on real clinical data; the model might have unintentionally created distinct patterns for each class, making it straightforward for the DL model. However, training on the mixed data set demonstrated better performance and generalization, confirming the value of combining synthetic and real clinical images. Although synthetic images can improve data sets, encourage data sharing, and prevent model overfitting, real clinical images are still essential for developing a robust GON detection model; integrating synthetic and real clinical images can optimize the performance of a DL model in GON detection. 
Acknowledgments
Supported by a program grant from the National Health and Medical Research Council (NHMRC; GNT1150144), NHMRC fellowships (SM, JEC, DAM, and AWH), and a Research Training Program Scholarship from the University of Tasmania (AKC). 
Disclosure: A.K. Chaurasia, None; S. MacGregor, None; J.E. Craig, None; D.A. Mackey, None; A.W. Hewitt, None 
References
Weinreb RN, Khaw PT. Primary open-angle glaucoma. Lancet. 2004; 363: 1711–1720. [CrossRef] [PubMed]
Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Global Health. 2021; 9: e144–e160. [CrossRef] [PubMed]
Medeiros FA, Zangwill LM, Bowd C, Sample PA, Weinreb RN. Use of progressive glaucomatous optic disk change as the reference standard for evaluation of diagnostic tests in glaucoma. Am J Ophthalmol. 2005; 139: 1010–1018. [CrossRef] [PubMed]
Varma R, Tielsch JM, Quigley HA, et al. Race-, age-, gender-, and refractive error—related differences in the normal optic disc. Arch Ophthalmol. 1994; 112: 1068–1076. [CrossRef] [PubMed]
Tsai CS, Zangwill L, Gonzalez C, et al. Ethnic differences in optic nerve head topography. J Glaucoma. 1995; 4: 248–257. [PubMed]
Gaasterland DE, Blackwell B, Dally LG , Caprioli J, Katz LJ, Ederer F; Advanced Glaumoca Intervention Study Investigators. The Advanced Glaucoma Intervention Study (AGIS): 10. Variability among academic glaucoma subspecialists in assessing optic disc notching. Trans Am Ophthalmol Soc. 2001; 99: 177–84. [PubMed]
Chen J. Comparison of the performance of four fundus cameras in clinical practice. Invest Ophthalmol Vis Sci. 2019; 60: 6121–6121.
Panwar N, Huang P, Lee J, et al. Fundus photography in the 21st century—a review of recent technological advances and their implications for worldwide healthcare. Telemed J E Health. 2016; 22: 198. [CrossRef] [PubMed]
Bramley T, Peeples P, Walt JG, Juhasz M, Hansen JE. Impact of vision loss on costs and outcomes in Medicare beneficiaries with glaucoma. Arch Ophthal. 2008; 126: 849–856. [CrossRef] [PubMed]
Chaurasia AK, Greatbatch CJ, Hewitt AW. Diagnostic accuracy of artificial intelligence in glaucoma screening and clinical practice. J Glaucoma. 2022; 31: 285–299. [CrossRef] [PubMed]
Sarvamangala DR, Kulkarni RV. Convolutional neural networks in medical image understanding: a survey. Evol Intell. 2022; 15: 1. [CrossRef] [PubMed]
Luca AR, Ursuleanu TF, Gheorghe L, et al. Impact of quality, type and volume of data used by deep learning models in the analysis of medical images. Informatics Med Unlocked. 2022; 29: 100911. [CrossRef]
Webb S . Deep learning for biology. Nature. 2018; 554: 555–557. [CrossRef]
Munappy AR, Bosch J, Olsson HH, Arpteg A, Brinne B. Data management for production quality deep learning models: challenges and solutions. J Syst Softw. 2022; 191: 111359. [CrossRef]
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Adv Neural Inf Process Syst. 2014; 3(11): 27.
You A, Kim JK, Ryu IH, Yoo TK. Application of generative adversarial networks (GAN) for ophthalmology image domains: a survey. Eye Vis. 2022; 9(1): 6. [CrossRef]
Saeed AQ, Snh SA, Che-Hamzah J, At AG. Accuracy of using generative adversarial networks for glaucoma detection: systematic review and bibliometric analysis. J Med Internet Res. 2021; 23(9): e27414.. [CrossRef] [PubMed]
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. 2015, arXiv:1511.06434.
Abry P, Mauduit V, Quemener E, Roux S. Multivariate multifractal texture DCGAN synthesis: how well does it work? How does one know? J Signal Process Syst. 2022; 94: 179–195. [CrossRef]
Srivastav D, Bajpai A, Srivastava P. Improved classification for pneumonia detection using transfer learning with GAN based synthetic image augmentation. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE; 2021: 433–437, https://ieeexplore.ieee.org/abstract/document/9377062.
Agarwal N, Singh V, Singh P. Semi-supervised learning with GANs for melanoma detection. In 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE; 2022: 141–147, https://ieeexplore.ieee.org/abstract/document/9787990.
Diaz-Pinto A, Colomer A, Naranjoet V, Morales S, Xu Y, Frangi AF. Retinal image synthesis and semi-supervised learning for glaucoma assessment. IEEE Trans Med Imaging. 2019; 38(9): 2211–2218 [CrossRef] [PubMed]
Chourasia S, Bhojane R, Patil R, Kotambkar DM. Domain adaptation using DCGAN for glaucoma diagnosis. In 2023 IEEE 8th International Conference for Convergence in Technology (I2CT). IEEE; 2023: 1–7, https://ieeexplore.ieee.org/abstract/document/10126413/metrics#metrics.
Sun Y, Yang G, Ding D, Cheng G, Xu J, Li X. A GAN-based domain adaptation method for glaucoma diagnosis. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE; 2020: 1–8, https://ieeexplore.ieee.org/abstract/document/9207358.
Khan SM, Liu X, Nath S, et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digital Health. 2021; 3: e51–e66. [CrossRef] [PubMed]
Kiefer R, Abid M, Steen J, Ardali MR, Amjadian E. A catalog of public glaucoma datasets for machine learning applications: a detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images. In Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023: 24–31.
Chaurasia AK, Liu G-S, Greatbatch J, et al. A generalised computer vision model for improved glaucoma screening using fundus images. Preprint, Research Square, 2023, doi:10.21203/rs.3.rs-3364615/v1.
Chaurasia A, Greatbatch CJ, Han X, et al. Highly accurate and precise automated cup-to-disc ratio quantification for glaucoma screening. medRxiv 2024.01.10.24301093 (2024), doi:10.1101/2024.01.10.24301093.
Hajian-Tilaki K . Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. 2013; 4: 627. [PubMed]
Schisterman EF, Faraggi D, Reiser B, Hu J. Youden index and the optimal threshold for markers with mass at zero. Stat Med. 2008; 27: 297. [CrossRef] [PubMed]
Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. Moving beyond P values: data analysis with estimation graphics. Nat Methods. 2019; 16: 565–566. [CrossRef] [PubMed]
Paszke A, Gross S, Massa F, et al. Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019: 32.
Inkawhich N . DCGAN Tutorial—PyTorch Tutorials 2.0.1+cu117 documentation, https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html. Accessed May 21, 2023.
Kumar AJS, Chong RS, Crowston JG, et al. Evaluation of generative adversarial networks for high-resolution synthetic image generation of circumpapillary optical coherence tomography images for glaucoma. JAMA Ophthalmol. 2022; 140: 974–981. [CrossRef] [PubMed]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Howard J, Gugger S. fastai: a layered API for deep learning. Information. 2020; 11(2), 108, doi:10.3390/info11020108.
PyTorch 2.0. Available at: https://pytorch.org/get-started/pytorch-2.0/. Accessed May 21, 2023.
Login - Nectar Dashboard. Available at: https://dashboard.rc.nectar.org.au/dashboard_home/. Accessed May 21, 2023.
torchvision. PyPI. Available at: https://pypi.org/project/torchvision/. Accessed May 21, 2023.
Installing. Scikit-learn. Available at: https://scikit-learn.org/stable/install.html. Accessed May 21, 2023.
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008; 9: 2579–2605.
Selvaraju RR,Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 2017: 618–626, doi:10.1007/s11263-019-01228-7.
Saad MM, O'Reilly R, Rehmani MHA. Survey on training challenges in generative adversarial networks for biomedical image analysis. arXiv e-prints: arXiv-2201, 2022.
Zheng C, Xie X, Zhou K, et al. Assessment of generative adversarial networks model for synthetic optical coherence tomography images of retinal disorders. Transl Vis Sci Technol. 2020; 9(2): 29. [CrossRef] [PubMed]
Figure 1.
 
(a) The true-positive rate versus false-positive rate with different thresholds in the training data set. (b) CDR probability density for two groups, with a red dashed line marking the optimal CDR threshold for detecting glaucoma using ROC analysis.
Figure 1.
 
(a) The true-positive rate versus false-positive rate with different thresholds in the training data set. (b) CDR probability density for two groups, with a red dashed line marking the optimal CDR threshold for detecting glaucoma using ROC analysis.
Figure 2.
 
Comparative visualization of nonglaucomatous class (a) and glaucomatous class (b): real versus DCGAN-synthetic disc images.
Figure 2.
 
Comparative visualization of nonglaucomatous class (a) and glaucomatous class (b): real versus DCGAN-synthetic disc images.
Figure 3.
 
t-SNE plot displaying the disparity between real (healthy and glaucomatous: 10,186 and 6874) and synthetic images (25,000 of healthy and 25,000 of glaucomatous) of the ONH. In the four subplots, we contrast real and synthetic images, both healthy and with glaucoma. The bottom left and right subplots illustrate that some data points revealing the ONH clinical features in the real and synthetic disc images overlap.
Figure 3.
 
t-SNE plot displaying the disparity between real (healthy and glaucomatous: 10,186 and 6874) and synthetic images (25,000 of healthy and 25,000 of glaucomatous) of the ONH. In the four subplots, we contrast real and synthetic images, both healthy and with glaucoma. The bottom left and right subplots illustrate that some data points revealing the ONH clinical features in the real and synthetic disc images overlap.
Table 1.
 
Classification Performance Metrics for Detecting GON on Synthetic and Mixed Disc Images
Table 1.
 
Classification Performance Metrics for Detecting GON on Synthetic and Mixed Disc Images
Table 2.
 
Models’ Diagnostic Performance on a Clinical Data Set from Drishti_GS to Detect GON during External Validation
Table 2.
 
Models’ Diagnostic Performance on a Clinical Data Set from Drishti_GS to Detect GON during External Validation
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×