Our synthetic data set comprised 50,000 images, equally divided into “glaucoma” and “healthy” classes. For the train-synthetic regimen, we used 80% of these images (20,000 “glaucoma,” 20,000 “healthy”) for training purposes. The remaining 20% (5041 “glaucoma,” 4959 “healthy”) constituted the validation set. This distribution was designed to assess the efficacy of using entirely synthetic data for training the model in glaucoma detection. We combined synthetic and real images in the train-mixed regimen, resulting in a more diverse data set. This included the entire set of 50,000 synthesized images plus 6874 real “glaucoma” and 10,186 real “healthy” images. We then split this combined data set into 80% for training and 20% for validation. The training subset included real images of healthy and glaucomatous subjects, with 8149 images representing healthy subjects and 5499 images for glaucomatous subjects. The validation set consists of 13,411 images, with 1374 images for glaucomatous, 2037 images for healthy eyes from the real set, and 5000 synthetic images for each class. The train-mixed regimen aimed to investigate the model's performance on a data set. Class weights were used to overcome the issue of imbalance in the number of images between synthetic and mixed training groups.
We split both real and synthetic data sets into an 80/20 ratio separately and then combined them for the mixed data set. To avoid any overlap between the training and testing sets, we used standard data visualization techniques for both sets before training the model using the show_batch function from the Fastai library.