In this study, we developed and evaluated models that predict whether patients with glaucoma will progress to require surgery, fusing multiple EHR and RNFL OCT imaging features and comparing XGBoost and TabNet model architectures. Models designed using a single modality of data, either EHR or RNFL, were compared against those trained using both data modalities as inputs. We found that performance improved when both the RNFL and EHR data were integrated into the TabNet and XGBoost models, compared with models using single modalities of data, which highlights the value of integrating multimodal data into prediction models for glaucoma. Moreover, the TabNet fusion model outperformed the conventional tree-based XGBoost fusion model, highlighting the promise of TabNet as a flexible deep learning architecture suitable for multiple modalities of healthcare data.
This study expands upon prior efforts in predicting progression to surgery in patients with glaucoma. Our previous models which leverage structured and free-text EHR data, achieved AUROC values ranging from approximately 0.70 to 0.90.
8–10 However, these models lacked integration of baseline optic nerve imaging data, which can provide crucial information on glaucoma severity and influence surgical decisions. Wang et al. attempted to bridge this gap by incorporating RNFL data alongside visual field and EHR data to forecast future surgeries among patients with glaucoma across different timelines in the future, achieving AUROCs ranging from 0.77 for long-term prediction to 0.85 for predictions within the 0.5 to 1 year timeframe.
21 Their tri-modality fusion approach involved a custom deep learning architecture, combining a vision transformer and a fully connected neural network. This method required complex and idiosyncratic preprocessing steps to convert the numerical results from imaging and visual field tests into resized and color-coded pixel arrays for input into the vision transformer. In contrast, TabNet offers a distinct advantage in its simplicity, as it can be applied directly and intuitively to diverse tabular datasets without requiring extensive customization or preprocessing. This enabled robust performance in predicting glaucoma progression to surgery, comparable to more complex fusion architectures, making TabNet a compelling choice for modeling with various healthcare data types structured in tabular formats.
In ophthalmology and across the broader healthcare domain, there have been relatively few prior studies using TabNet, although these have been promising. TabNet was among the architectures used to predict stroke mortality using EHR data in Hong Kong, achieving AUROC of 0.840 for predicting death by ischemic stroke.
22 Additionally, fusion models with EHR and extracted features from computed tomography (CT) data have been found to outperform single modality models in predicting pulmonary embolism mortality, demonstrating the potential of using multiple modalities of data with TabNet.
23 Our study is one of the first to explore the applicability of TabNet in developing prediction models within the field of ophthalmology, and the first in glaucoma. A previous ophthalmic prediction model used TabNet to predict which patients may benefit from a corneal topographical scan based on ophthalmic examination information, demonstrating superior performance compared to XGBoost and a fully connected neural network in a Korean population.
24 Another study, which predicted the presence of sarcopenia based on eye examination information, showed no substantial differences among TabNet, XGBoost, and logistic regression models.
25 Taken together, these studies suggest a promising role for TabNet in ophthalmology, while also highlighting the need for ongoing investigation for how best to incorporate the diversity of medical data types into prediction models using TabNet. Our study particularly focuses on this question by using TabNet for the development of fusion models that integrate data from EHR alongside results from RNFL imaging studies, which are important for assessing the health of the optic nerve.
A strength of this study was our investigation into model explainability, using both model-agnostic approaches to compare between TabNet and XGBoost, as well as TabNet-specific approaches that give further insight into TabNet's attention-based feature importance. In general, many features which were important for our models, such as IOP, age, and visual acuity, were clinically reasonable features that would influence the clinicians’ patient care decisions for glaucoma. In addition, many features from RNFL scans were also among the top most important features for model prediction, including global structural metrics of the nerve such as cup-to-disc ratio, cup volume, rim area, and disc area, as well as individual quadrant thicknesses. These features are fairly consistent with results of explainability studies on previous work, where IOP, visual acuity, rim area, and cup volume were highly important.
9,12,24 Shapley values offer a convenient model-agnostic way to ascertain feature importance, and can be used across different model architectures. The relative Shapley importance of RNFL features differed between XGBoost and TabNet models, but this may be expected as two independent models would not necessarily emphasize all of the same feature inputs to produce their predictions. Some prior studies have also suggested that Shapley explainability can sometimes be inaccurate and misleading, as it does not directly rely upon information encoded in the model structure itself, but merely computes explainability based on observed patterns of model inputs and outputs.
26,27 In our study, results from the Shapley feature importance analysis for TabNet did not exactly mirror the TabNet model-specific feature importance results; race/ethnicity as a feature was comparatively de-emphasized, whereas visual acuity and certain RNFL features were more important in the model-specific feature importance analysis. This ability for direct interpretability analyses sets TabNet apart from many other deep learning models. Moreover, TabNet's instance-wise feature selection aids efficient learning by fully utilizing model capacity for the most salient features, leading to an easily explainable decision-making process.
We acknowledge that this study also has several limitations. The models developed and validated in this investigation are based on a dataset from patients receiving care at a single academic center, which may reduce generalizability. Furthermore, the cohort was limited to those patients who did undergo RNFL OCT scans during their care, limiting the sample size. A limited cohort size also precludes model performance analyses in subgroups, such as by glaucoma subtype, which would be valuable information, as well as prediction over longer time horizons, requiring larger numbers of patients with longer periods of follow-up. Future studies can consider modeling using multi-institutional registries, such as the newly established Sight OUtcomes Research Collaborative (sourcecollaborative.org), after imaging results become integrated into this registry. Additionally, we recognize that the criteria for performing glaucoma surgery can vary among physicians due to differences in practice patterns, with some opting for earlier intervention whereas others may delay until later stages. This variability reflects the lack of universal standards and the personalized nature of glaucoma care. Incorporating larger and more diverse datasets in future work could aid in addressing this limitation by capturing wider variation in surgical practice patterns. Additionally, future work could also incorporate direct prediction of glaucoma-related findings, such as future RNFL or visual field progression, which are less dependent on surgical practice patterns. Another potential limitation is that the present models included only demographic and eye examination features from the EHR, and did not include medication or diagnosis data. In doing so, this study more heavily emphasizes the clinical measurements obtained from ophthalmic examinations and the structural features of the eye. Future work could explore the incorporation of other elements from the EHR, although medication and diagnosis features carry considerably more noise than documented eye examination measurements. In addition, future work could also incorporate results from visual field testing in TabNet fusion models. Although we acknowledge that our approach does not include raw image data derived from the OCT scans, such data are often proprietary and difficult to obtain, store, and analyze, and their incorporation into models limits the ability to deploy such models because data ingestion requirements into them becomes more complex. We have demonstrated that a simpler approach using OCT imaging results stored in tabular form is still highly effective. Future research could explore different methods of image representation to better encapsulate the spatial information inherent in imaging scans to potentially augment performance.
In conclusion, we developed models that predict the patients with glaucoma progression to surgery using data from EHR and RNFL OCT scans, comparing TabNet and XGBoost modeling techniques. We found that models incorporating both EHR and RNFL data outperformed single-modality models. In addition, TabNet outperformed XGBoost, achieving the highest AUROC at 0.832. Our research highlights the simplicity and versatility of TabNet for data fusion models in healthcare, which may have broad applicability for researchers in the healthcare domain. Future research can investigate incorporating additional modalities, such as visual field test results. Such endeavors hold promise for enhancing predictive modeling and augmenting decision-making processes for patients with glaucoma.