**Purpose**:
The structural changes measured by optical coherence tomography (OCT) are related to functional changes in visual fields (VFs). This study aims to accurately assess the structure-function relationship and overcome the challenges brought by the minimal measurable level (floor effect) of segmentation-dependent OCT measurements commonly used in prior studies.

**Methods**:
We developed a deep learning model to estimate the functional performance directly from three-dimensional (3D) OCT volumes and compared it to the model trained with segmentation-dependent two-dimensional (2D) OCT thickness maps. Moreover, we proposed a gradient loss to utilize the spatial information of VFs.

**Results**:
Our 3D model was significantly better than the 2D model both globally and pointwise regarding both mean absolute error (MAE = 3.11 + 3.54 vs. 3.47 ± 3.75 dB, *P* < 0.001) and Pearson's correlation coefficient (0.80 vs. 0.75, *P* < 0.001). On a subset of test data with floor effects, the 3D model showed less influence from floor effects than the 2D model (MAE = 5.24 ± 3.99 vs. 6.34 ± 4.58 dB, *P* < 0.001, and correlation 0.83 vs. 0.74, *P* < 0.001). The gradient loss improved the estimation error for low-sensitivity values. Furthermore, our 3D model outperformed all prior studies.

**Conclusions**:
By providing a better quantitative model to encapsulate the structure-function relationship more accurately, our method may help deriving VF test surrogates.

**Translational Relevance**:
DL-based VF surrogates not only benefit patients by reducing the testing time of VFs but also allow clinicians to make clinical judgments without the inherent limitations of VFs.

^{1}

^{–}

^{4}Optical coherence tomography (OCT) is a commonly used noninvasive imaging technology for quantitative glaucomatous structural assessment,

^{5}

^{,}

^{6}serving as a biomarker for glaucoma diagnosis and monitoring.

^{7}In practice, a visual field (VF) test is required in order to make the diagnosis of glaucoma. VF testing is essential to identify and monitor functional abnormalities, but it is highly subjective and susceptible to fluctuations due to various factors, particularly in patients with glaucoma.

^{8}

^{–}

^{12}On the other hand, commercially available spectral-domain (SD)-OCT has very good reproducibility in both healthy and glaucomatous subjects.

^{13}

^{–}

^{15}Previous studies have shown that the structural changes measured by OCT are related to the functional changes measured in the VF tests.

^{16}

^{–}

^{20}Thus, the surrogates of VF test outcomes could be derived from OCT retinal scans via accurate quantitative models encapsulating structure-function relationships. It can not only benefit the patients by reducing the long VF testing time but also allow clinicians to make clinical judgments without the inherent limitations of VF tests, such as the subjective nature and high test-to-test variability.

^{19}

^{,}

^{21}

^{–}

^{23}including the widely used Garway-Heath map that mapped localized retinal nerve fiber layer (RNFL) defects measured by red-free RNFL photographs to the location of points on standard automated perimetry (SAP).

^{19}The increasing popularity of OCT and the improved ability to assess the RNFL has led previous research to correlate the structure-function relationship between OCT measurements like peripapillary RNFL thickness and SAP using statistical tools.

^{21}

^{–}

^{23}However, these studies relied on small samples and summarized thickness measurements.

^{24}

^{–}

^{27}Attempts have been made to use deep learning to estimate VFs from higher-dimensional SD-OCT images

^{28}

^{,}

^{29}and measurements such as 2D SD-OCT thickness maps.

^{30}

^{–}

^{35}However, 2D thickness maps are prone to segmentation errors introduced by the adopted segmentation algorithms, leading to inaccurate estimation of VF. In addition to segmentation errors, segmentation-based OCT measurements have floor effects.

^{36}The floor effect is the point at which no further structural loss can be detected by segmentation-based OCT measurements. RNFL dynamic measurement range differs among devices. Previous research has reported that RNFL measurements reached the floor effect at 57 µm for Cirrus OCT devices.

^{36}The OCT floor effect will affect the learning of the structure-function relationship for patients with advanced disease when the VF continues to change. On the other hand, segmentation-free 3D OCT volumes provide more information and are not subject to segmentation errors. Maetschke et al.

^{28}proposed a 3D deep learning model to directly infer global VF measurements, such as visual field index (VFI) and mean deviation (MD), from unsegmented 3D OCT volumes and achieved Pearson's correlations of 0.88 ± 0.035 and 0.88 ± 0.023 for VFI and MD, respectively. However, global VF measurements do not reveal subtle functional abnormalities and/or specific damage patterns that help with phenotyping various glaucoma subtypes.

^{36}The subset with floor effects was comprised of 117 VF-OCT pairs from 20 subjects, whereas the other subset had 879 pairs from 127 subjects.

^{37}we detected the ONH region, segmented Bruch's membrane opening (BMO) surface, and used two-stage thin-plate splines to estimate and correct the distinct axial artifacts in BMO surface. Then the 3D scans were flattened by moving each A-scan along the z direction to make BMO surface flat to reduce variances in OCT volumes. The region containing 144 × 144 × 576 voxels centered on the ONH were then cropped and downsampled to 72 × 72 × 144 voxels with gaussian antialiasing filtering to reduce memory consumption during model training. Of the 54 points of 24-2 VF tests, the 2 blind spot points were excluded. The sensitivity values of the remaining 52 test points were temporally smoothed over 5 consecutive VF visits of the same eye using pointwise linear regression to reduce random fluctuations. The average time span for longitudinal smoothing was 1166.04 ± 598.91 days. For eyes that had less than 5 VF visits, we used the original VFs. All left eye visits were flipped horizontally to match the right eye format for both OCTs and VFs.

^{th}component of the 52-point VF vector for the n-th sample. We also experimented on mean absolute error loss and got similar results as mean square error loss. Therefore, we only reported results trained with mean square error loss in the paper.

^{38}

^{–}

^{40}Therefore, to better utilize the spatial correlation in nearby VF points, we rearranged the output VF vector into an 8 × 9 2D array and filled in the boundary with zeros, as demonstrated in Figure 3. Then, a gradient loss term was proposed to minimize the differences in the horizontal and vertical gradients, respectively, between the estimated and ground truth VF array as follows:

_{h}and ∇

_{v}denoted the horizontal and vertical gradient operators respectively. The y and \(\hat y\) denoted the ground truth and estimated 2D VF arrays respectively. The

*M*and

_{h}*M*were the binary masks to exclude gradient for blind spot and boundary points for horizontal and vertical gradients, as shown in Figures 3B and 3D. With the gradient loss, the 52 points of the VF vectors were not independent to one another anymore. The model was enforced to not only reconstruct the individual points faithfully but also to match with the change pattern of ground truth visual field defects. Thus, the gradient loss emphasized the learning of the spatial changes between adjacent VF points, which is essentially the spatial patterns of VF defects.

_{v}_{1}= 0.9, β

_{2}= 0.999, and ε = 10

^{−8}.

^{41}The initial learning rate was 2 × 10

^{–4}, which was then decayed every 100 epochs by 10

^{−1}. We trained the model for 200 epochs.

*P*< 0.001, Wilcoxon Signed-rank test). Pearson's correlation coefficient of the 3D model was also significantly better than the 2D model (0.80 vs. 0.75,

*P*< 0.001, Williams’ test for equality of correlations). Both metrics demonstrated that the overall performance of the 3D model was significantly better than that of the 2D model.

^{19}

^{33}In addition, the high test-to-test variability of VF sensitivity values below 20 dB may also contribute to the large estimation error in the low sensitivity end.

^{42}

^{,}

^{43}

*i*represented one of the 52 points). Red in the map represented that the 3D model had lower/better MAE than the 2D model and blue represented the opposite. Subplot Figure 6B showed the

*P*values of the MAE difference. White cells had

*P*values ≥ 0.05 and black and greyish cells had

*P*values < 0.05. These plots showed that the 3D model was significantly better than the 2D model in most of VF positions in terms of MAE. Similarly, the pointwise Pearson's correlation coefficients of 3D model were significantly better than those of 2D models in most of VF positions. This pointwise analysis again demonstrated the supremacy of using 3D OCT volumes versus using 2D thickness maps for predicting the VF.

^{44}No notable difference among Grad-CAM maps of different VF points was observed due to the low-resolution limitation of Grad-CAM, so we averaged the Grad-CAM maps across 52 VF points. As shown in Figure 6, the model automatically learned to focus on clinically relevant regions. In the XY projection plane of 3D OCT volume, the highlighted 7 clock and 11 clock regions corresponded to the regions where the retina has the thickest RNFL and therefore is most sensitive to damage. In the XZ and YZ projection planes, the model paid most attention on optic disc rim, which again demonstrated that the model correctly learned to use information from clinically relevant regions.

^{34}compared 24-2 VF outcomes from 2D RNFL and ganglion cell with inner plexiform layer (GCIPL) thickness maps measured by SD-OCT and by swept-source OCT (SS-OCT). They showed that their model estimated VFs better with SS-OCT (root mean square error [RMSE] = 4.51 ± 2.54 dB) than did with SD-OCT (RMSE = 5.29 ± 2.68 dB). Although we cannot directly compare with their results, we achieved similar RMSE (4.22 ± 2.88 dB) for our 2D model which also utilized the RNFL map of SD-OCT and better RMSE (3.83 ± 2.74 dB) for our 3D model. Park et al.

^{31}developed an InceptionV3-based model to predict 24-2 VF from combined GCIPL and RNFL thickness maps and achieved RMSE of 4.79 ± 2.56 dB, which was also similar to our 2D model (4.22 ± 2.88 dB). Mariottoni et al.

^{33}used the convolutional neural network (CNN) to predict 24-2 VF from 768 peripapillary RNFL thickness points in SD-OCT. They reported an average correlation coefficient of 0.60 and an MAE of 4.25 dB. In our case, the correlation coefficients were 0.75 and 0.80 for 2D and 3D models, respectively, and the MAEs were 3.47 ± 3.75 dB and 3.11 + 3.54 dB, respectively. Overall, our 2D model had similar performance with previous segmentation-dependent methods, but our 3D model significantly outperformed previous segmentation-dependent methods. Comparisons are summarized in Table 4.

^{30}

^{,}

^{31}Guo

^{30}claimed that it was due to the superior retina having higher structure-function correlation than the inferior retina. Park

^{31}suggested another reason for the observation. Glaucomatous damage may occur sequentially from the inferotemporal sector to the superotemporal sectors.

^{45}As a result, the inferotemporal ONH sector could have larger error because the inferotemporal ONH sector progressed more than other sectors. Our results may support their hypothesis. The superior-inferior MAE gaps narrowed as glaucoma progressed. For example, the superior-inferior MAE gaps narrowed from 0.49 dB (|2.96−3.47| dB) temporally and 0.43 dB (|2.47−2.90| dB) nasally in data without floor effects to 0.12 dB (|5.06−5.18| dB) temporally and 0.09 dB (|5.26−5.35| dB) nasally in data with floor effects. Nevertheless, the pattern of Pearson's correlation coefficients did not agree with the pattern of MAE. The superior correlations were not better than the inferior correlations.

^{43}further limiting the model's predictive performance for low VF sensitivity values. Repeated tests may help suppress noise in VF to construct a cleaner dataset. Finally, despite the advantage of feature agnosticism, using non-segmented OCT volumes is inefficient in terms of memory and computation because the OCT volumes contain a substantial area without tissue information. Although flattening, cropping, and downsampling have been applied in preprocessing steps to improve the efficiency of memory and computation, more advanced methods combining the segmentation masks can be explored in future work.

**Z. Chen**, None;

**E. Shemuelian**, None;

**G. Wollstein**, None;

**Y. Wang**, None;

**H. Ishikawa**, None;

**J.S. Schuman**, AEYE, Inc. (C, I), Carl Zeiss Meditec (C, P, R), Ocugenix (I, P, R), Ocular Therapeutix, Inc. (C, I), Opticient (C, I), Perfuse, Inc. (C)

*Bull World Health Organization*. 2004; 82(11): 844–851.

*Ophthalmology*. 2014; 121(11): 2081–2090. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1997; 115(6): 777–784. [CrossRef] [PubMed]

*J Glaucoma*. 2003; 12(2): 139–150. [CrossRef] [PubMed]

*Ophthalmology*. 2018; 125(10): 1515–1525. [CrossRef] [PubMed]

*Arch Ophthalmol*. 2000; 118: 22–26. [CrossRef] [PubMed]

*Science*. 1991; 254(5035): 1178–1181. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2011; 95(2): 189–193. [CrossRef] [PubMed]

*Acta Ophthalmologica*. 1989; 67(5): 537–545. [CrossRef] [PubMed]

*Graefe's Arch Clinic Exp Ophthalmol*. 1991; 229(6): 501–504. [CrossRef]

*Sixth International Visual Field Symposium*. New York, NY: Springer; 1985. p. 1–6.

*Invest Ophthalmol Vis Sci*. 1987; 28(5): 767–771. [PubMed]

*Ophthalmology*. 2008; 115(4): 661–666. [CrossRef] [PubMed]

*Japanese J Ophthalmol*. 2010; 54(1): 43–47. [CrossRef]

*Acta Ophthalmologica*. 2011; 89(1): e23–e29. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2004; 138(2): 218–225. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2013; 54(4): 3046–3051. [CrossRef] [PubMed]

*Arch Ophthalmol*. 2011; 129(12): 1529–1536. [CrossRef] [PubMed]

*Ophthalmology*. 2000; 107(10): 1809–1815. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2017; 101(8): 1052–1058. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2005; 46(10): 3712–3717. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2008; 49(7): 3018–3025. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2018; 59(7): 2801–2807. [CrossRef] [PubMed]

*JAMA*. 2016; 316(22): 2402–2410. [CrossRef] [PubMed]

*Ann Rev Biomed Engineer*. 2017; 19: 221. [CrossRef]

*PLoS One*. 2019; 14(7): e0219126. [CrossRef] [PubMed]

*2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)*. IEEE; 2020; 1–5.

*Ophthalmology*. 2022; 129(7): 781–791. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci.*2017; 58(10): 3975–3985. [CrossRef] [PubMed]

*PLoS One*. 2020; 15(7): e0234902. [CrossRef] [PubMed]

*Ophthalmology*. 2021; 128(11): 1534–1548. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2020; 9(2): 19. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2021; 10(7): 4–4. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2023; 246: 163–173. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2015; 56(11): 6344–6351. [CrossRef] [PubMed]

*Biomed Optics Express*. 2011; 2(8): 2403–2416. [CrossRef]

*J Glaucoma*. 2007; 16(1): 146–152. [CrossRef] [PubMed]

*Ophthalmology*. 2003; 110(10): 1890–1894. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2006; 141(4): 703. [CrossRef] [PubMed]

*Arch Ophthalmol*. 2010; 128(5): 570–576. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2015; 4(2): 10. [CrossRef] [PubMed]

*Proceedings of the IEEE International Conference on Computer Vision*; 2017. p. 618–626.

*British J Ophthalmol*. 2012; 96(1): 57–61, doi:10.1136/bjo.2010.196782. [CrossRef]