Open Access
Artificial Intelligence  |   June 2023
Segmentation-Free OCT-Volume-Based Deep Learning Model Improves Pointwise Visual Field Sensitivity Estimation
Author Affiliations & Notes
  • Zhiqi Chen
    Department of Electrical and Computer Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
  • Eitan Shemuelian
    Department of Ophthalmology, NYU Langone Health, NYU Grossman School of Medicine, New York, NY, USA
  • Gadi Wollstein
    Department of Ophthalmology, NYU Langone Health, NYU Grossman School of Medicine, New York, NY, USA
    Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
    Center for Neural Science, NYU College of Arts and Sciences, New York, NY, USA
  • Yao Wang
    Department of Electrical and Computer Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
    Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
  • Hiroshi Ishikawa
    Department of Electrical and Computer Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
    Department of Ophthalmology, Casey Eye Institute, Oregon Health and Science University, Portland, OR, USA
    Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
  • Joel S. Schuman
    Department of Electrical and Computer Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
    Department of Ophthalmology, NYU Langone Health, NYU Grossman School of Medicine, New York, NY, USA
    Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
    Center for Neural Science, NYU College of Arts and Sciences, New York, NY, USA
    Wills Eye Hospital, Philadelphia, PA, USA
  • Correspondence: Joel S. Schuman, Wills Eye Hospital, Philadelphia, PA, USA. e-mail: jschuman@willseye.org 
Translational Vision Science & Technology June 2023, Vol.12, 28. doi:https://doi.org/10.1167/tvst.12.6.28
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zhiqi Chen, Eitan Shemuelian, Gadi Wollstein, Yao Wang, Hiroshi Ishikawa, Joel S. Schuman; Segmentation-Free OCT-Volume-Based Deep Learning Model Improves Pointwise Visual Field Sensitivity Estimation. Trans. Vis. Sci. Tech. 2023;12(6):28. https://doi.org/10.1167/tvst.12.6.28.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The structural changes measured by optical coherence tomography (OCT) are related to functional changes in visual fields (VFs). This study aims to accurately assess the structure-function relationship and overcome the challenges brought by the minimal measurable level (floor effect) of segmentation-dependent OCT measurements commonly used in prior studies.

Methods: We developed a deep learning model to estimate the functional performance directly from three-dimensional (3D) OCT volumes and compared it to the model trained with segmentation-dependent two-dimensional (2D) OCT thickness maps. Moreover, we proposed a gradient loss to utilize the spatial information of VFs.

Results: Our 3D model was significantly better than the 2D model both globally and pointwise regarding both mean absolute error (MAE = 3.11 + 3.54 vs. 3.47 ± 3.75 dB, P < 0.001) and Pearson's correlation coefficient (0.80 vs. 0.75, P < 0.001). On a subset of test data with floor effects, the 3D model showed less influence from floor effects than the 2D model (MAE = 5.24 ± 3.99 vs. 6.34 ± 4.58 dB, P < 0.001, and correlation 0.83 vs. 0.74, P < 0.001). The gradient loss improved the estimation error for low-sensitivity values. Furthermore, our 3D model outperformed all prior studies.

Conclusions: By providing a better quantitative model to encapsulate the structure-function relationship more accurately, our method may help deriving VF test surrogates.

Translational Relevance: DL-based VF surrogates not only benefit patients by reducing the testing time of VFs but also allow clinicians to make clinical judgments without the inherent limitations of VFs.

Introduction
Glaucoma is the second leading cause of blindness worldwide, characterized by slow progression and loss of retinal ganglion cells and their axons, ultimately leading to functional defects that impair quality of life.14 Optical coherence tomography (OCT) is a commonly used noninvasive imaging technology for quantitative glaucomatous structural assessment,5,6 serving as a biomarker for glaucoma diagnosis and monitoring.7 In practice, a visual field (VF) test is required in order to make the diagnosis of glaucoma. VF testing is essential to identify and monitor functional abnormalities, but it is highly subjective and susceptible to fluctuations due to various factors, particularly in patients with glaucoma.812 On the other hand, commercially available spectral-domain (SD)-OCT has very good reproducibility in both healthy and glaucomatous subjects.1315 Previous studies have shown that the structural changes measured by OCT are related to the functional changes measured in the VF tests.1620 Thus, the surrogates of VF test outcomes could be derived from OCT retinal scans via accurate quantitative models encapsulating structure-function relationships. It can not only benefit the patients by reducing the long VF testing time but also allow clinicians to make clinical judgments without the inherent limitations of VF tests, such as the subjective nature and high test-to-test variability. 
Prior attempts to characterize structure-function relationships have focused on correlating VF outcomes and structural measurements,19,2123 including the widely used Garway-Heath map that mapped localized retinal nerve fiber layer (RNFL) defects measured by red-free RNFL photographs to the location of points on standard automated perimetry (SAP).19 The increasing popularity of OCT and the improved ability to assess the RNFL has led previous research to correlate the structure-function relationship between OCT measurements like peripapillary RNFL thickness and SAP using statistical tools.2123 However, these studies relied on small samples and summarized thickness measurements. 
Recent developments of artificial intelligence have shown the potential of deep learning algorithms in modeling complex nonlinear relationships and learning task-specific features automatically from high-dimensional data in various medical sectors.2427 Attempts have been made to use deep learning to estimate VFs from higher-dimensional SD-OCT images28,29 and measurements such as 2D SD-OCT thickness maps.3035 However, 2D thickness maps are prone to segmentation errors introduced by the adopted segmentation algorithms, leading to inaccurate estimation of VF. In addition to segmentation errors, segmentation-based OCT measurements have floor effects.36 The floor effect is the point at which no further structural loss can be detected by segmentation-based OCT measurements. RNFL dynamic measurement range differs among devices. Previous research has reported that RNFL measurements reached the floor effect at 57 µm for Cirrus OCT devices.36 The OCT floor effect will affect the learning of the structure-function relationship for patients with advanced disease when the VF continues to change. On the other hand, segmentation-free 3D OCT volumes provide more information and are not subject to segmentation errors. Maetschke et al.28 proposed a 3D deep learning model to directly infer global VF measurements, such as visual field index (VFI) and mean deviation (MD), from unsegmented 3D OCT volumes and achieved Pearson's correlations of 0.88 ± 0.035 and 0.88 ± 0.023 for VFI and MD, respectively. However, global VF measurements do not reveal subtle functional abnormalities and/or specific damage patterns that help with phenotyping various glaucoma subtypes. 
To overcome the above limitations, we developed a deep learning model to estimate pointwise functional outcome directly from segmentation-free 3D OCT volumes and compared the performance with the model trained with segmentation-dependent 2D OCT thickness maps. We also proposed a gradient loss term to utilize spatial information in VFs by reshaping VFs into 2D arrays and calculating gradients between adjacent VF points. 
Methods
Dataset Preparation
This retrospective study was performed in accordance with the tenets of the Declaration of Helsinki. The study was approved by the Institutional Review Board of New York University Langone Health Center. 
Subjects that had at least one VF test and one SD-OCT visit within 90 days of each other were included in the study. VF tests were performed using Humphrey Field Analyzer with the 24-2 Swedish Interactive Threshold Algorithm (Zeiss, Dublin, CA, USA) protocol. Tests with more than 33% fixation losses, 15% false positive errors, or 15% false negative errors were excluded. SD-OCT tests were acquired using the Cirrus HD-OCT instrument (Zeiss). RNFL thickness maps were obtained from the 6 mm × 6 mm optic nerve head (ONH) scan 200 × 200 protocol. Tests with signal strength less than 6 dB were excluded. 
The final dataset was comprised of 8387 VF tests and 15,026 ONH OCT scans from 1129 subjects, spanning multiple visits. Table 1 summarized the demographic characteristics of the dataset, whereas Figure 1 depicted the histogram of VF MD. To create a training and testing dataset, we randomly split the dataset at a ratio of 9:1, based on subjects. Consequently, the training set contained 7303 VF tests and 10,711 ONH scans from 999 subjects. During the training, we randomly selected one ONH scan from an eligible OCT set, which was within 90 days of the associated VF visit, to combine with the associated VF and form a training pair for each training minibatch. On average, every VF test had 1.54 ± 0.76 corresponding OCT visits. The test set included 996 VF-OCT pairs from 130 subjects. To further investigate model performances when RNFL reaches the measurement floor, we split the test set into two subsets: one with and the other without floor effects. The subset with floor effects included visits that had an average RNFL thickness less than or equal to 57 µm according to a previous publication.36 The subset with floor effects was comprised of 117 VF-OCT pairs from 20 subjects, whereas the other subset had 879 pairs from 127 subjects. 
Table 1.
 
Demographic Characteristics of the Dataset
Table 1.
 
Demographic Characteristics of the Dataset
Figure 1.
 
Distribution of VF MD in our dataset.
Figure 1.
 
Distribution of VF MD in our dataset.
Figure 2.
 
Model details. (A) Three dimensional model taking a 3D OCT volume as input (1 × 144 × 72 × 72, channel × depth × width × height). (B) Two dimensional model taking a 2D RNFL thickness map as input (3 × 200 × 200, channel × width × height).
Figure 2.
 
Model details. (A) Three dimensional model taking a 3D OCT volume as input (1 × 144 × 72 × 72, channel × depth × width × height). (B) Two dimensional model taking a 2D RNFL thickness map as input (3 × 200 × 200, channel × width × height).
Following a previous study,37 we detected the ONH region, segmented Bruch's membrane opening (BMO) surface, and used two-stage thin-plate splines to estimate and correct the distinct axial artifacts in BMO surface. Then the 3D scans were flattened by moving each A-scan along the z direction to make BMO surface flat to reduce variances in OCT volumes. The region containing 144 × 144 × 576 voxels centered on the ONH were then cropped and downsampled to 72 × 72 × 144 voxels with gaussian antialiasing filtering to reduce memory consumption during model training. Of the 54 points of 24-2 VF tests, the 2 blind spot points were excluded. The sensitivity values of the remaining 52 test points were temporally smoothed over 5 consecutive VF visits of the same eye using pointwise linear regression to reduce random fluctuations. The average time span for longitudinal smoothing was 1166.04 ± 598.91 days. For eyes that had less than 5 VF visits, we used the original VFs. All left eye visits were flipped horizontally to match the right eye format for both OCTs and VFs. 
Model Architecture and Training
Model
A convolutional neural network was developed to take one ONH OCT volume as input to predict a 52-dimensional VF vector. We adopted the 3D version of the ResNet18 as the backbone of the feature extractor and replaced the last fully connected layer with 2 convolutional layers to output 52-point VF sensitivities. We also implemented a 2D version of the ResNet18 model to predict the VF from the ONH thickness map. Model details for the 3D and 2D model architectures were shown in Figure 2
Loss Function
To train the network, we used the mean square error as the reconstruction loss: 
\begin{eqnarray}{L_{reconstruction}} = {\rm{\;}}\frac{1}{{N \times {\rm{\;}}52}}\mathop \sum \limits_{n = 1,{\rm{\;}}i = 1}^{N,{\rm{\;}}52} {\left( {y_i^n - {\rm{\;}}\hat y_i^n} \right)^2}\end{eqnarray}
(1)
where \(y_i^n\) and \(\hat y_i^n\) were the ground-truth and estimated value, respectively, for the ith component of the 52-point VF vector for the n-th sample. We also experimented on mean absolute error loss and got similar results as mean square error loss. Therefore, we only reported results trained with mean square error loss in the paper. 
Typical glaucomatous VF loss is characterized by arcuate defects, nasal steps, and other patterns on rectangular grids.3840 Therefore, to better utilize the spatial correlation in nearby VF points, we rearranged the output VF vector into an 8 × 9 2D array and filled in the boundary with zeros, as demonstrated in Figure 3. Then, a gradient loss term was proposed to minimize the differences in the horizontal and vertical gradients, respectively, between the estimated and ground truth VF array as follows:  
\begin{eqnarray} L_{horizontal\;gradient} = ||M_h\left( \nabla _{h}{\rm y} - \nabla_h {\hat{y}} \right)||_{1} \end{eqnarray}
(2)
 
\begin{eqnarray} {L_{vertical\;gradient}} = \;{\left\| {{M_v}\left( {{\nabla _v}{\rm{y}} - {\nabla _v}\hat y{\rm{\;}}} \right)} \right\|_1} \end{eqnarray}
(3)
where ∇h and ∇v denoted the horizontal and vertical gradient operators respectively. The y and \(\hat y\) denoted the ground truth and estimated 2D VF arrays respectively. The Mh and Mv were the binary masks to exclude gradient for blind spot and boundary points for horizontal and vertical gradients, as shown in Figures 3B and 3D. With the gradient loss, the 52 points of the VF vectors were not independent to one another anymore. The model was enforced to not only reconstruct the individual points faithfully but also to match with the change pattern of ground truth visual field defects. Thus, the gradient loss emphasized the learning of the spatial changes between adjacent VF points, which is essentially the spatial patterns of VF defects. 
Figure 3.
 
An example of rearranged 2D VF and gradients.
Figure 3.
 
An example of rearranged 2D VF and gradients.
Finally, we set the training loss as: 
\begin{eqnarray} && {L_{total\;}} = \nonumber \\ && \;{L_{reconstruction\;}} + \;\lambda {L_{horizontal\;gradient}} + \lambda {L_{vertical\;gradient}}\end{eqnarray}
(4)
where λ was set to be 10. 
The model was trained with stochastic gradient descent, optimized by the Adam optimizer with β1 =  0.9, β2 =  0.999, and ε = 10−8.41 The initial learning rate was 2 × 10–4, which was then decayed every 100 epochs by 10−1. We trained the model for 200 epochs. 
Statistical Analysis
We used mean absolute error (MAE) and Pearson's correlation coefficients between the measured and estimated VFs to evaluate the model performance. Pearson's correlation coefficients were tested with the Williams test for equality of correlations and the MAEs were tested with Wilcoxon Signed-rank test. 
Results
3D Model Versus 2D Model
Table 2 summarized the global performance comparison between the model trained with 3D OCT volumes and the model trained with 2D thickness maps. The MAE of the 3D model was significantly lower than the 2D model (3.11 vs. 3.47 dB, P < 0.001, Wilcoxon Signed-rank test). Pearson's correlation coefficient of the 3D model was also significantly better than the 2D model (0.80 vs. 0.75, P < 0.001, Williams’ test for equality of correlations). Both metrics demonstrated that the overall performance of the 3D model was significantly better than that of the 2D model. 
Table 2.
 
Comparison Between 2D-Thickness-Map-Based Model and 3D-Volume-Based Model
Table 2.
 
Comparison Between 2D-Thickness-Map-Based Model and 3D-Volume-Based Model
For the subset with floor effect, both MAE and correlation coefficients of the 3D model were significantly better than that of the 2D model in terms of the MAE. For the subset without floor effect, the reduction in the MAE by the 3D model was significant, whereas the gain in terms of the Pearson's correlation coefficient was marginal. Note that the MAE was larger with either the 3D or 2D model for the subset with floor effect, possibly because there were fewer training samples coming from patients with floor effect. Table 3 summarized the sectional results of the 3D model. We used the sectors defined in the Garway-Heath map.19 
Table 3.
 
Performance of the 3D model in Different Retinal Sectors
Table 3.
 
Performance of the 3D model in Different Retinal Sectors
Figure 4 showed the error trends of both models at every VF sensitivity level. The error trends of the 3D and 2D models did not differ much for data without floor effects in Figure 4B. Conversely, the MAE of the 3D model clearly showed a better trend than that of the 2D model for data with floor effects in Figure 4A. Regardless of the floor effects, both models performed better for VF sensitivity between 20 and 35 dB, which appeared more often in our dataset, than values under 20 dB. As a result, a plateau effect was presented in Figure 5 when the measured sensitivity was less than 20 dB. A similar pattern was also presented in a study by Mariottoni et al.33 In addition, the high test-to-test variability of VF sensitivity values below 20 dB may also contribute to the large estimation error in the low sensitivity end.42,43 
Figure 4.
 
Error trend comparison on test data with and without floor effects. (A) With floor effects. (B) Without floor effects. In each figure, we showed the MAE for different VF values in curves as well as the histogram of the ground truth VF values in a bar plot.
Figure 4.
 
Error trend comparison on test data with and without floor effects. (A) With floor effects. (B) Without floor effects. In each figure, we showed the MAE for different VF values in curves as well as the histogram of the ground truth VF values in a bar plot.
Figure 5.
 
Box plot of the 3D-volume-based model estimations.
Figure 5.
 
Box plot of the 3D-volume-based model estimations.
In Figure 6, we plotted the difference between evaluation metrics of 3D and 2D models for each VF points. Subplot Figure 6A showed the pointwise mean absolute error map (i.e. \(MAE_{2D}^i - \;MAE_{3D}^i\) where i represented one of the 52 points). Red in the map represented that the 3D model had lower/better MAE than the 2D model and blue represented the opposite. Subplot Figure 6B showed the P values of the MAE difference. White cells had P values ≥ 0.05 and black and greyish cells had P values < 0.05. These plots showed that the 3D model was significantly better than the 2D model in most of VF positions in terms of MAE. Similarly, the pointwise Pearson's correlation coefficients of 3D model were significantly better than those of 2D models in most of VF positions. This pointwise analysis again demonstrated the supremacy of using 3D OCT volumes versus using 2D thickness maps for predicting the VF. 
Figure 6.
 
Pointwise analysis on floored test data. (A) MAE difference map. Red cell means the 3D model has lower MAE than that of the 2D model at the point while blue cell means the opposite. (B) Significance map of MAE difference. (C) Correlation difference map. Red cell means the 3D model has higher Pearson's correlation coefficient than that of the 2D model at the point while blue cell means the opposite. (D) Significance map of correlation difference. In the significance map, white cells have P values ≥ 0.05 and black and greyish cells have P values < 0.05.
Figure 6.
 
Pointwise analysis on floored test data. (A) MAE difference map. Red cell means the 3D model has lower MAE than that of the 2D model at the point while blue cell means the opposite. (B) Significance map of MAE difference. (C) Correlation difference map. Red cell means the 3D model has higher Pearson's correlation coefficient than that of the 2D model at the point while blue cell means the opposite. (D) Significance map of correlation difference. In the significance map, white cells have P values ≥ 0.05 and black and greyish cells have P values < 0.05.
Figure 7 showed pointwise analysis of the 3D model performance for data with and without floor effects. For data without floor effects, the 3D model performed better in central locations than in boundary locations probably due to VF variability and rim artifacts which led to inaccurate VF measurements at boundary points due to decentered trial lens. However, the performance in central locations was worse than that in boundary locations for data with floor effects. 
Figure 7.
 
Pointwise results of the 3D model for data with and without floor effects. (A) Pointwise MAE for data with floor effects. (B) Pointwise correlation for data with floor effects. (C) Pointwise MAE for data without floor effects. (D) Pointwise correlation for data without floor effects.
Figure 7.
 
Pointwise results of the 3D model for data with and without floor effects. (A) Pointwise MAE for data with floor effects. (B) Pointwise correlation for data with floor effects. (C) Pointwise MAE for data without floor effects. (D) Pointwise correlation for data without floor effects.
Figure 8 visualized which parts of the retinal OCT images the 3D model focused on when predicting VF outcomes, derived using the Grad-CAM technique.44 No notable difference among Grad-CAM maps of different VF points was observed due to the low-resolution limitation of Grad-CAM, so we averaged the Grad-CAM maps across 52 VF points. As shown in Figure 6, the model automatically learned to focus on clinically relevant regions. In the XY projection plane of 3D OCT volume, the highlighted 7 clock and 11 clock regions corresponded to the regions where the retina has the thickest RNFL and therefore is most sensitive to damage. In the XZ and YZ projection planes, the model paid most attention on optic disc rim, which again demonstrated that the model correctly learned to use information from clinically relevant regions. 
Figure 8.
 
Grad-CAM visualization. (A) XY-plane projection of Grad-CAM map overlapped on an en face OCT image. The heatmap is generated by averaging across z direction. (B) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the horizontal green line in the en face image. (C) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the vertical green line in the en face image.
Figure 8.
 
Grad-CAM visualization. (A) XY-plane projection of Grad-CAM map overlapped on an en face OCT image. The heatmap is generated by averaging across z direction. (B) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the horizontal green line in the en face image. (C) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the vertical green line in the en face image.
Gradient Loss
Figure 9 showed performances corresponding to different setting of λ in the training loss function (4). Introducing the gradient loss clearly gave a boost to the performance in the lower VF sensitivity end. We chose λ = 10 in our experiments because it provided lowest estimation errors in low sensitivity regions in the training set. 
Figure 9.
 
Effectiveness of the gradient loss.
Figure 9.
 
Effectiveness of the gradient loss.
Discussion
In this study, we developed a deep learning model that is capable of inferring pointwise VF sensitivities directly from segmentation-free OCT 3D volumes. Previous studies had also used deep learning to learn structure-function relationships. Commonly, segmentation-based thickness measurements by OCT devices were used as inputs to predict VF outcomes. Shin et al.34 compared 24-2 VF outcomes from 2D RNFL and ganglion cell with inner plexiform layer (GCIPL) thickness maps measured by SD-OCT and by swept-source OCT (SS-OCT). They showed that their model estimated VFs better with SS-OCT (root mean square error [RMSE] = 4.51 ± 2.54 dB) than did with SD-OCT (RMSE = 5.29 ± 2.68 dB). Although we cannot directly compare with their results, we achieved similar RMSE (4.22 ± 2.88 dB) for our 2D model which also utilized the RNFL map of SD-OCT and better RMSE (3.83 ± 2.74 dB) for our 3D model. Park et al.31 developed an InceptionV3-based model to predict 24-2 VF from combined GCIPL and RNFL thickness maps and achieved RMSE of 4.79 ± 2.56 dB, which was also similar to our 2D model (4.22 ± 2.88 dB). Mariottoni et al.33 used the convolutional neural network (CNN) to predict 24-2 VF from 768 peripapillary RNFL thickness points in SD-OCT. They reported an average correlation coefficient of 0.60 and an MAE of 4.25 dB. In our case, the correlation coefficients were 0.75 and 0.80 for 2D and 3D models, respectively, and the MAEs were 3.47 ± 3.75 dB and 3.11 + 3.54 dB, respectively. Overall, our 2D model had similar performance with previous segmentation-dependent methods, but our 3D model significantly outperformed previous segmentation-dependent methods. Comparisons are summarized in Table 4
Table 4.
 
Comparison With Prior Studies Using SD-OCT Data
Table 4.
 
Comparison With Prior Studies Using SD-OCT Data
Our 3D model significantly outperformed our 2D model both globally and locally in terms of both MAE and Pearson's correlation in the test dataset with floor effects, suggesting that the 3D model had less influence from the floor effects. The MAE gain of the 3D model in the floored test data (6.34 – 5.24 = 1.10 dB) was almost 4 times of that in test data without floor effects (3.29 – 2.82 = 0.33 dB), as shown in Table 2. Similarly, the correlation gain in the floored test data (0.83 – 0.74 = 0.09) was three times of that in test data without floor effects (0.70 – 0.67 = 0.03). The large performance gap shown in Figure 4 further demonstrated the 3D model's advantages when floor effects were present. 
Nonetheless, the sectoral observation that the superior sector had smaller MAEs compared to the corresponding inferior sector (3.14 vs. 3.67 dB temporally, 2.75 vs. 3.23 dB nasally, as shown in Table 4) coincided with Guo and Park.30,31 Guo30 claimed that it was due to the superior retina having higher structure-function correlation than the inferior retina. Park31 suggested another reason for the observation. Glaucomatous damage may occur sequentially from the inferotemporal sector to the superotemporal sectors.45 As a result, the inferotemporal ONH sector could have larger error because the inferotemporal ONH sector progressed more than other sectors. Our results may support their hypothesis. The superior-inferior MAE gaps narrowed as glaucoma progressed. For example, the superior-inferior MAE gaps narrowed from 0.49 dB (|2.96−3.47| dB) temporally and 0.43 dB (|2.47−2.90| dB) nasally in data without floor effects to 0.12 dB (|5.06−5.18| dB) temporally and 0.09 dB (|5.26−5.35| dB) nasally in data with floor effects. Nevertheless, the pattern of Pearson's correlation coefficients did not agree with the pattern of MAE. The superior correlations were not better than the inferior correlations. 
Despite the improvement introduced by segmentation-free OCT volumes, this study still has limitations. First, the dataset is imbalanced in terms of VF sensitivity values. A relatively large error is present due to under-represented low and very high sensitivities. Although we demonstrate that predicting VFs directly from 3D OCT volumes and using gradient loss could alleviate the issue for low sensitivity values, the problem persists. Further investigation is needed with additional low and very high sensitivity data. Second, VF tests are prone to errors and variability, leading to difficulties in model training and evaluations, which imposes a lower bound on the achievable predictive performance. In addition, the test-retest variability of VF is even higher for sensitivity values under 19 dB,43 further limiting the model's predictive performance for low VF sensitivity values. Repeated tests may help suppress noise in VF to construct a cleaner dataset. Finally, despite the advantage of feature agnosticism, using non-segmented OCT volumes is inefficient in terms of memory and computation because the OCT volumes contain a substantial area without tissue information. Although flattening, cropping, and downsampling have been applied in preprocessing steps to improve the efficiency of memory and computation, more advanced methods combining the segmentation masks can be explored in future work. 
Conclusion
In conclusion, we investigated a deep learning model to estimate pointwise VF sensitivities directly from segmentation-free 3D OCT volumes to overcome the floor effects of segmentation-dependent 2D OCT measurements. We compared the performance with the model trained with segmentation-dependent 2D OCT thickness maps in a large clinical dataset. We showed that the 3D model is significantly better than the 2D model both globally and pointwise. Further analysis on a subset of the test dataset with floor effects demonstrated that the 3D model had less influence from the floor effects and thus generated more accurate results than the 2D model. Moreover, we proposed a gradient loss function to be combined with mean square error loss to utilize the spatial information of VFs. The proposed loss improved the estimation error for low sensitivity values. Our study provided a better quantitative model to accurately encapsulate the structure-function relationship. Our study could offer new insights into developing surrogates of VF test outcomes from OCT retinal scans. This may help patients who are unable to undergo real VF examinations. It could also help to circumvent the unreliability of the VF test. 
Acknowledgments
Supported by National Institutes of Health (NIH) R01-EY013178, R01-EY030929, and P30-EY013079, and an unrestricted grant from Research to Prevent Blindness. 
Disclosure: Z. Chen, None; E. Shemuelian, None; G. Wollstein, None; Y. Wang, None; H. Ishikawa, None; J.S. Schuman, AEYE, Inc. (C, I), Carl Zeiss Meditec (C, P, R), Ocugenix (I, P, R), Ocular Therapeutix, Inc. (C, I), Opticient (C, I), Perfuse, Inc. (C) 
References
Resnikoff S, Pascolini D, Etya'Ale D, et al. Global data on visual impairment in the year 2002. Bull World Health Organization. 2004; 82(11): 844–851.
Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology. 2014; 121(11): 2081–2090. [CrossRef] [PubMed]
Gutierrez P, Wilson MR, Johnson C, et al. Influence of glaucomatous visual field loss on health-related quality of life. Arch Ophthalmol. 1997; 115(6): 777–784. [CrossRef] [PubMed]
Nelson P, Aspinall P, Papasouliotis O, Worton B, O'Brien C. Quality of life in glaucoma and its relationship with visual function. J Glaucoma. 2003; 12(2): 139–150. [CrossRef] [PubMed]
Shin Joong Won, Sung Kyung Rim, Park Sun-Won. Patterns of progressive ganglion cell-inner plexiform layer thinning in glaucoma detected by OCT. Ophthalmology. 2018; 125(10): 1515–1525. [CrossRef] [PubMed]
Bowd C, Weinreb RN, Williams , JM, Zangwill LM. The retinal nerve fiber layer thickness in ocular hypertensive, normal, and glaucomatous eyes with optical coherence tomography. Arch Ophthalmol. 2000; 118: 22–26. [CrossRef] [PubMed]
Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991; 254(5035): 1178–1181. [CrossRef] [PubMed]
Fogagnolo P, Sangermani C, Oddone F, et al. Long-term perimetric fluctuation in patients with different stages of glaucoma. Br J Ophthalmol. 2011; 95(2): 189–193. [CrossRef] [PubMed]
Wild J, Dengler-Harles M, Searle A, O'Neill E, Crews S. The influence of the learning effect on automated perimetry in patients with suspected glaucoma. Acta Ophthalmologica. 1989; 67(5): 537–545. [CrossRef] [PubMed]
Marra G, Flammer J. The learning and fatigue effect in automated perimetry. Graefe's Arch Clinic Exp Ophthalmol. 1991; 229(6): 501–504. [CrossRef]
Langerhorst C, Van den Berg T, Spronsen RV, Greve E. Results of a fluctuation analysis and defect volume program for automated static threshold perimetry with the scoperimeter. In: Sixth International Visual Field Symposium. New York, NY: Springer; 1985. p. 1–6.
Brenton R, Argus WA. Fluctuations on the Humphrey and Octopus perimeters. Invest Ophthalmol Vis Sci. 1987; 28(5): 767–771. [PubMed]
Budenz DL, Fredette MJ, Feuer WJ, Anderson DR. Reproducibility of peripapillary retinal nerve fiber thickness measurements with stratus OCT in glaucomatous eyes. Ophthalmology. 2008; 115(4): 661–666. [CrossRef] [PubMed]
Hong S, Kim CY, Lee WS, Seong GJ. Reproducibility of peripapillary retinal nerve fiber layer thickness with spectral domain cirrus high-definition optical coherence tomography in normal eyes. Japanese J Ophthalmol. 2010; 54(1): 43–47. [CrossRef]
Garcia-Martin E, Pinilla I, Idoipe M, Fuertes I, Pueyo V. Intra and interoperator reproducibility of retinal nerve fibre and macular thickness measurements using Cirrus Fourier-domain OCT. Acta Ophthalmologica. 2011; 89(1): e23–e29. [CrossRef] [PubMed]
Wollstein G, Schuman JS, Price LL, et al. Optical coherence tomography (OCT) macular and peripapillary retinal nerve fiber layer measurements and automated visual fields. Am J Ophthalmol. 2004; 138(2): 218–225. [CrossRef] [PubMed]
Sato S, Hirooka K, Baba T, Tenkumo K, Nitta E, Shiraga F. Correlation between the ganglion cell-inner plexiform layer thickness measured with cirrus HD-OCT and macular visual field sensitivity measured with microperimetry. Invest Ophthalmol Vis Sci. 2013; 54(4): 3046–3051. [CrossRef] [PubMed]
Raza AS, Cho J, de Moraes CG, et al. Retinal ganglion cell layer thickness and local visual field sensitivity in glaucoma. Arch Ophthalmol. 2011; 129(12): 1529–1536. [CrossRef] [PubMed]
Garway-Heath DF, Poinoosawmy D, Fitzke FW, Hitchings RA. Mapping the visual field to the optic disc in normal tension glaucoma eyes. Ophthalmology. 2000; 107(10): 1809–1815. [CrossRef] [PubMed]
Lee JW, Morales E, Sharifipour F, et al. The relationship between central visual field sensitivity and macular ganglion cell/inner plexiform layer thickness in glaucoma. Br J Ophthalmol. 2017; 101(8): 1052–1058. [CrossRef] [PubMed]
Gardiner SK, Johnson CA, Cioffi GA. Evaluation of the structure-function relationship in glaucoma. Invest Ophthalmol Vis Sci. 2005; 46(10): 3712–3717. [CrossRef] [PubMed]
Ferreras A, Pablo LE, Garway-Heath DF, Fogagnolo P, Garcia-Feijoo J. Mapping standard automated perimetry to the peripapillary retinal nerve fiber layer in glaucoma. Invest Ophthalmol Vis Sci. 2008; 49(7): 3018–3025. [CrossRef] [PubMed]
Fujino Y, Murata H, Matsuura M, et al. Mapping the central 10° visual field to the optic nerve head using the structure–function relationship. Invest Ophthalmol Vis Sci. 2018; 59(7): 2801–2807. [CrossRef] [PubMed]
Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316(22): 2402–2410. [CrossRef] [PubMed]
Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Ann Rev Biomed Engineer. 2017; 19: 221. [CrossRef]
Maetschke S, Antony B, Ishikawa H, Wollstein G, Schuman J, Garnavi R. A feature agnostic approach for glaucoma detection in OCT volumes. PLoS One. 2019; 14(7): e0219126. [CrossRef] [PubMed]
Chen Z, Wang Y, Wollstein G, et al. Macular GCIPL thickness map prediction via time-aware convolutional LSTM. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE; 2020; 1–5.
Maetschke S, Antony B, Ishikawa H, Wollstein G, Schuman J, Garnavi R. Inference of visual field test performance from OCT volumes using deep learning. arXiv preprint arXiv:190801428, 2019, https://doi.org/10.1371/journal.pone.0234902.
Kihara Y, Montesano G, Chen A, et al. Policy-driven, multimodal deep learning for predicting visual fields from the optic disc and OCT imaging[J]. Ophthalmology. 2022; 129(7): 781–791. [CrossRef] [PubMed]
Guo Z, Kwon YH, Lee K, Wang K, Wahle A, Alward WL, et al. Optical coherence tomography analysis based prediction of Humphrey 24-2 visual field thresholds in patients with glaucoma. Invest Ophthalmol Vis Sci. 2017; 58(10): 3975–3985. [CrossRef] [PubMed]
Park K, Kim J, Lee J. A deep learning approach to predict visual field using optical coherence tomography. PLoS One. 2020; 15(7): e0234902. [CrossRef] [PubMed]
Christopher M, Bowd C, Proudfoot JA, et al. Deep learning estimation of 10-2 and 24-2 visual field metrics based on thickness maps from macula OCT. Ophthalmology. 2021; 128(11): 1534–1548. [CrossRef] [PubMed]
Mariottoni EB, Datta S, Dov D, et al. Artificial intelligence mapping of structure to function in glaucoma. Transl Vis Sci Technol. 2020; 9(2): 19. [CrossRef] [PubMed]
Shin J, Kim S, Kim J, Park K. Visual field inference from optical coherence tomography using deep learning algorithms: a comparison between devices. Transl Vis Sci Technol. 2021; 10(7): 4–4. [CrossRef] [PubMed]
Kamalipour A, Moghimi S, Khosravi P, et al. Deep learning estimation of 10-2 visual field map based on circumpapillary retinal nerve fiber layer thickness measurements. Am J Ophthalmol. 2023; 246: 163–173. [CrossRef] [PubMed]
Mwanza JC, Kim HY, Budenz DL, et al. Residual and dynamic range of retinal nerve fiber layer thickness in glaucoma: comparison of three OCT platforms. Invest Ophthalmol Vis Sci. 2015; 56(11): 6344–6351. [CrossRef] [PubMed]
Antony B, Abramoff MD, Tang L, et al. Automated 3-D method for the correction of axial artifacts in spectral-domain optical coherence tomography images. Biomed Optics Express. 2011; 2(8): 2403–2416. [CrossRef]
Sihota R, Gupta V, Tuli D, Sharma A, Sony P, Srinivasan G. Classifying patterns of localized glaucomatous visual field defects on automated perimetry. J Glaucoma. 2007; 16(1): 146–152. [CrossRef] [PubMed]
Lau LI, Liu CJL, Chou JCK, Hsu WM, Liu JH. Patterns of visual field defects in chronic angle-closure glaucoma with different disease severity. Ophthalmology. 2003; 110(10): 1890–1894. [CrossRef] [PubMed]
Hoffmann EM, Boden C, Zangwill LM, Bourne RR, Weinreb RN, Sample PA. Inter-eye comparison of patterns of visual field loss in patients with glaucomatous optic neuropathy. Am J Ophthalmol. 2006; 141(4): 703. [CrossRef] [PubMed]
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.
Wall M, Woodward KR, Doyle CK, Zamba G. The effective dynamic ranges of standard automated perimetry sizes III and V and motion and matrix perimetry. Arch Ophthalmol. 2010; 128(5): 570–576. [CrossRef] [PubMed]
Gardiner SK, Demirel S, Goren D, Mansberger SL, Swanson WH. The effect of stimulus size on the reliable stimulus range of perimetry. Transl Vis Sci Technol. 2015; 4(2): 10. [CrossRef] [PubMed]
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 618–626.
Savini G, Barboni P, Parisi V, Carbonelli M. The influence of axial length on retinal nerve fibre layer thickness and optic-disc size measurements by spectral-domain OCT. British J Ophthalmol. 2012; 96(1): 57–61, doi:10.1136/bjo.2010.196782. [CrossRef]
Figure 1.
 
Distribution of VF MD in our dataset.
Figure 1.
 
Distribution of VF MD in our dataset.
Figure 2.
 
Model details. (A) Three dimensional model taking a 3D OCT volume as input (1 × 144 × 72 × 72, channel × depth × width × height). (B) Two dimensional model taking a 2D RNFL thickness map as input (3 × 200 × 200, channel × width × height).
Figure 2.
 
Model details. (A) Three dimensional model taking a 3D OCT volume as input (1 × 144 × 72 × 72, channel × depth × width × height). (B) Two dimensional model taking a 2D RNFL thickness map as input (3 × 200 × 200, channel × width × height).
Figure 3.
 
An example of rearranged 2D VF and gradients.
Figure 3.
 
An example of rearranged 2D VF and gradients.
Figure 4.
 
Error trend comparison on test data with and without floor effects. (A) With floor effects. (B) Without floor effects. In each figure, we showed the MAE for different VF values in curves as well as the histogram of the ground truth VF values in a bar plot.
Figure 4.
 
Error trend comparison on test data with and without floor effects. (A) With floor effects. (B) Without floor effects. In each figure, we showed the MAE for different VF values in curves as well as the histogram of the ground truth VF values in a bar plot.
Figure 5.
 
Box plot of the 3D-volume-based model estimations.
Figure 5.
 
Box plot of the 3D-volume-based model estimations.
Figure 6.
 
Pointwise analysis on floored test data. (A) MAE difference map. Red cell means the 3D model has lower MAE than that of the 2D model at the point while blue cell means the opposite. (B) Significance map of MAE difference. (C) Correlation difference map. Red cell means the 3D model has higher Pearson's correlation coefficient than that of the 2D model at the point while blue cell means the opposite. (D) Significance map of correlation difference. In the significance map, white cells have P values ≥ 0.05 and black and greyish cells have P values < 0.05.
Figure 6.
 
Pointwise analysis on floored test data. (A) MAE difference map. Red cell means the 3D model has lower MAE than that of the 2D model at the point while blue cell means the opposite. (B) Significance map of MAE difference. (C) Correlation difference map. Red cell means the 3D model has higher Pearson's correlation coefficient than that of the 2D model at the point while blue cell means the opposite. (D) Significance map of correlation difference. In the significance map, white cells have P values ≥ 0.05 and black and greyish cells have P values < 0.05.
Figure 7.
 
Pointwise results of the 3D model for data with and without floor effects. (A) Pointwise MAE for data with floor effects. (B) Pointwise correlation for data with floor effects. (C) Pointwise MAE for data without floor effects. (D) Pointwise correlation for data without floor effects.
Figure 7.
 
Pointwise results of the 3D model for data with and without floor effects. (A) Pointwise MAE for data with floor effects. (B) Pointwise correlation for data with floor effects. (C) Pointwise MAE for data without floor effects. (D) Pointwise correlation for data without floor effects.
Figure 8.
 
Grad-CAM visualization. (A) XY-plane projection of Grad-CAM map overlapped on an en face OCT image. The heatmap is generated by averaging across z direction. (B) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the horizontal green line in the en face image. (C) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the vertical green line in the en face image.
Figure 8.
 
Grad-CAM visualization. (A) XY-plane projection of Grad-CAM map overlapped on an en face OCT image. The heatmap is generated by averaging across z direction. (B) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the horizontal green line in the en face image. (C) Grad-CAM heatmap overlapped on an OCT B-scan corresponding to the vertical green line in the en face image.
Figure 9.
 
Effectiveness of the gradient loss.
Figure 9.
 
Effectiveness of the gradient loss.
Table 1.
 
Demographic Characteristics of the Dataset
Table 1.
 
Demographic Characteristics of the Dataset
Table 2.
 
Comparison Between 2D-Thickness-Map-Based Model and 3D-Volume-Based Model
Table 2.
 
Comparison Between 2D-Thickness-Map-Based Model and 3D-Volume-Based Model
Table 3.
 
Performance of the 3D model in Different Retinal Sectors
Table 3.
 
Performance of the 3D model in Different Retinal Sectors
Table 4.
 
Comparison With Prior Studies Using SD-OCT Data
Table 4.
 
Comparison With Prior Studies Using SD-OCT Data
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×