Glaucoma, a group of progressive optic neuropathies, stands as a leading cause of irreversible blindness worldwide.
1 The disease is characterized by the degeneration of retinal ganglion cells and their axons, resulting in distinctive optic disc changes and specific patterns of vision loss. Visual field (VF) testing, which identifies subtle areas of vision loss, is considered the gold standard for glaucoma monitoring.
2 However, standard automated perimetry (SAP) faces challenges, such as subjectivity, variability, time-consuming nature, and frequent failure to detect early damage.
3–5 In contrast, optical coherence tomography (OCT) offers an objective measure of retinal nerve fiber layer (RNFL) thickness, providing a more reliable indicator of glaucoma progression.
6 In glaucoma management, deep learning (DL) approaches have been proposed to explore the relationship between OCT-derived structural changes and visual function.
The advent of artificial intelligence (AI), particularly DL, has revolutionized healthcare by automating the analysis of medical images. Convolutional neural networks (CNNs), a type of DL architecture, have been widely used in this domain due to their ability to automatically learn hierarchical features from raw images. CNNs excel at tasks such as image segmentation, feature extraction, and classification, facilitating accurate disease detection, localization, and diagnosis.
7,8 However, recently, transformer-based models, which were initially developed for natural language processing tasks, have shown promising results in computer vision tasks. Transformers rely on self-attention mechanisms to capture long-range dependencies and global context, enabling them to learn more robust and expressive feature representations compared with CNNs.
9 Among the transformer-based models, Distillation with No Labels (DINO)
10 has been successfully applied across various medical domains with the ability to handle diverse types of medical imaging data.
11,12 Some medical applications include regression tasks with hematoxylin and eosin (H&E) stained histopathological images,
13 disease detection in chest X-rays,
14,15 and classification tasks involving brain magnetic resonance imagings (MRIs).
11 Despite their potential, the application of transformer-based models in medical image analysis, particularly in the context of glaucoma assessment using OCT scans, remains largely unexplored. DINO could potentially learn robust and interpretable features from OCT scans.
Previous studies have focused on processing OCT scans, varying them by layer (e.g. ganglion cell-inner plexiform layer [GCIPL] versus RNFL), location (macula versus optic nerve head), and instrument type (spectral domain OCT [SD-OCT] versus swept source OCT [SS-OCT]), to predict global metrics and/or pointwise visual field sensitivities assessed by 24-2 Humphrey Visual Field (HVF).
16–20 These studies, primarily using CNNs, have demonstrated remarkable accuracy in predicting VF defects using RNFL thickness maps. However, they have not fully addressed the negative impact of RNFL artifacts on predictive models.
16,21 Furthermore, the potential of transformer-based models in this context remains largely unexplored. Transformers’ ability to capture long-range dependencies and global context could potentially lead to more accurate and robust predictions of visual field defects from OCT scans, especially in the presence of artifacts or other confounding factors. Additionally, the self-attention mechanisms of transformers could provide better interpretability by highlighting the most relevant regions in the OCT scans for predicting VF defects, which is crucial for clinical adoption and trust in AI-based systems.
In this study, we introduce a novel DL strategy to predict 24-2 HVF from OCT-derived RNFL thickness maps. We hypothesize that preprocessing OCT scans with artifact correction could improve visual field prediction. To explore this, we sequentially developed and validated the artifact correction and visual field prediction models. First, we created the artifact correction model to restore artifact-laden scans. Then, we evaluated visual field prediction improvements after incorporating this correction step. Additionally, we compared how CNNs and Transformers benefit from artifact correction while performing the same VF prediction task.