RetOCTNet is a powerful tool for segmenting and measuring RNFL and total retinal thickness from OCT images of healthy and RGC-injured eyes in rats. It can segment the RNFL and total retinal thickness with an overall F1 score of 0.88 and 0.98, respectively. The F1 score of 0.78 for the ONC 12-week test set was the lowest across data sets, reflecting the difficulty in segmenting advanced damage near the ONH. However, our human annotators encountered similar challenges in segmenting the ONH after OHT, which likely propagated uncertainty and potentially lowered the quality of their annotations. Nevertheless, this F1 value of 0.78 in the test set for the 12-week ONC eye denotes a substantial overlap, confirming the algorithm's efficacy in capturing relevant structural information despite advanced injury stages.
The challenge of assessing retinal thickness after injury was consistent in the uncertainty maps that highlighted the largest uncertainty was located around the ONH and highest after RGC injury. To ensure that the good F1 score, precision, and recall metrics reflect the accurate thickness measurements, we compared the ground-truth and RetOCTNet thickness measurements. We found a good overall correlation between RetOCTNet and ground-truth segmentations. The Bland–Altman plots calculated that the mean absolute offset between ground-truth and RetOCTNet thickness measurements (bias ≤1.7 µm) was less than the axial resolution of our images (resolution = 2.75 µm). This indicates that the thickness measurements agreed with ground truth and were accurate within the resolution of the OCT measurements. In addition, there were no trends in the difference measurements toward underestimating or overestimating the thickness values. We also did not find a significant difference in the final RNFL and retinal thickness measurements between RetOCTNet and ground truth. We examined segmentations where RetOCTNet did not perform well. Typically, these images had regions of low contrast (
Supplementary Figs. S3 and
S4) that RetOCTNet identified as mislabeled background in human annotated images or the presence of complex morphology due to the obliquely sliced blood vessels after inducing OHT (
Supplementary Fig. S5). These outliers were also partially related to human annotators attempting to segment unclear regions caused by shadows, while RetOCTNet avoided segmenting these regions (
Supplementary Figs. S3 and
S4). This may account for large differences in thickness between human annotators and RetOCTNet. This indicates the potential need for visual inspection or postprocessing. Postprocessing can help remove areas of discontinuity or account for the ONH where the RNFL and retinal layers are no longer present. However, overall RetOCTNet was able to accurately segment the RNFL and total retina.
In this work, we used uncertainty as a visualization tool to understand areas that were more difficult for RetOCTNet to segment. We found that there is more uncertainty when there is advanced damage, suggesting larger data sets may be needed to improve the DL approach when including advanced damage. Interestingly, expert human annotators also found it more challenging to segment RGC-injured eyes when there was advanced damage. By contrasting uncertainty between the training and test data sets, we observed similar uncertainty values across data sets and injury levels. This served as a quality control, demonstrating a lack of overfitting when the uncertainty values in the test data set were similar in magnitude to those in the training set. In future works, we plan on using uncertainty to guide the segmentations and decrease the number of required scans for training.
14 In addition, most of the uncertainty was around the ONH. While this region is relevant for remodeling after RGC injury and was used for assessing the quality of RetOCTNet segmentations, it was not used for assessing RNFL or retinal thickness. Therefore, future implementations of RetOCTNet that focus on the loss or thinning of these layers may wish to use postprocessing to omit this region. This would remove the region with the most uncertainty and reduce variability in the thickness maps due to the complex geometry around the ONH (crowding of the blood vessels, thickening of the layers, and loss of retinal layers as the RGC axons exit the posterior eye).
To further test RetOCTNet, we compared the total retinal thickness of RetOCTNet relative to human annotators on a mouse data set without RGC injury, detailed in
Supplementary Section 1. These OCT scans altered the species (mouse vs. rat), scan type (annular vs. radial), scan size (1, 1.2, or 1.4 mm vs. 3 mm), and frames per B-scan (48 frames vs. 20 frames) from what RetOCTNet was originally trained on. RetOCTNet performed well on 95.7% of these scans with an F1, precision, and recall that showed excellent agreement with values of 0.97, 0.99, and 0.96, respectively. The remaining scans were poorly segmented (>25 µm difference in thickness), which suggests that human inspection is required when using RetOCTNet outside the scope of its original training.
A potential limitation of RetOCTNet is the impact of uncertainty in the human annotations. Most deep learning approaches require human annotations or input for training and evaluation. Therefore, if an annotator has uncertainty in their segmentations, this would be propagated to the labels. However, using expert annotators and larger data sets helps minimize the potential impact of small uncertainties in the segmentations performed by humans for the training and evaluation of RetOCTNet. In future work, we plan to use uncertainty as a guide to improve segmentation accuracy and decrease the number of scans required for training.
To ensure the ability of RetOCTNet to evaluate RNFL and retinal thickness outside the ONH, we also evaluated RetOCTNet on volume scans away from the ONH. We found RetOCTNet was able to accurately segment OCT volume scans and B-scans with or without the ONH region. This improves the use of RetOCTNet on other data sets, resolutions, and potentially for groups who want to determine RNFL and retinal thickness away from the ONH.
A key aspect of RetOCTNet is its ability to generalize to OCT scans taken from healthy and RGC-injured eyes. Further, we developed this approach for both OHT and ONC models of RGC injury. To the best of our knowledge, there was no other deep learning–based algorithm available to do this in rats. While tuning the hyperparameters of RetOCTNet and training the network is computationally expensive, its usage is not. The algorithm can process an OCT scan within a matter of seconds.
Despite RetOCTNet's ability to quickly analyze RNFL and total retinal thickness from OCT images, there are some limitations. One of the main challenges to address was the imbalanced data set. To avoid information leakage, scans from each eye were only present in one data set (training, testing, or validation). We also ensured that the training data set included OCT scans from each time point and condition: control eyes, OHT 4 weeks postinjury, OHT 8 weeks postinjury, and optic nerve crush. Although the overall data set did not have the same number of controls and injury types, this comprehensive distribution in the training set allowed RetOCTNet to be exposed to the various types and degrees of injury within our data set. We recognize the absence of OHT samples at the 8-week time point within the test set. Despite this limitation, the model performed well in segmenting OHT samples at the 8-week interval in the validation set, underscoring that it has even OCT scans within the training set. A similar statement can be made to the lack of ONC 12-week data in the validation set. Since the model performs adequately at this condition in the test set, that is indicative enough that the learning process was successful. Further, this different data set was composed of volumes with less frame averaging, including ONC injured eyes at 0, 4, 8, and 12 weeks, and RetOCTNet performed well at each time point. The predominance of injured eyes amplifies the algorithm's proficiency in handling complex segmentation tasks. The test and validation sets, despite their imbalance, fulfill their roles by providing a diverse range of cases, affirming the model's adaptability and robustness across varying degrees of retinal injury.
Another limitation was that the model was exclusively trained on patches from nonoverlapping image regions. While patch-based training may introduce segmentation discontinuities, especially near the ONH, the model's high F1 scores suggest this is not a significant concern. Furthermore, as ONH thickness measurements are typically excluded in glaucoma injury assessments, any minor inconsistencies in this region do not detract from the model's overall utility. We also acknowledge that it was exclusively trained on scans from a Bioptigen OCT machine. Future work will include scans acquired with other OCT machines (e.g., Spectralis OCT Heidelberg Engineering GmbH [Heidelberg, Germany] and Phoenix Micron IV system [Bend, Oregon, USA]) to account for interinstrumentation variability and image characteristics.