Raw image data were imported into the SS-OCT Viewer software (version 3.0, Tomey Corporation). One human expert grader (grader 1, A.A.P.) masked to the identities and examination results of the participants, marked the scleral spurs in four images per eye; these labels of scleral spur locations were considered the reference standard. Before the current study, grader 1 underwent extensive training in scleral spur detection, including manual analysis of approximately 500 AS-OCT images (not included in the study) while supervised by at least one of two glaucoma specialists (B.Y.X or R.V.). Four images were analyzed and exported in JPEG format per eye: the first image was oriented along the horizontal (temporal–nasal) meridian, and additional OCT images were evenly spaced 45° apart. Owing to a limitation in the SS-OCT Viewer software, scleral spur locations could only be exported when at least six of eight possible scleral spurs were marked. Thus, corrupt images and images with significant artifacts, including by the eyelids or arcus senilis, that precluded manual detection of the scleral spur by grader 1 were excluded. This step helped to minimize noise during CNN model training and testing. Images were divided in two along the vertical midline, and right-sided images were rotated about the vertical axis to standardize images with the ACA to the left and corneal apex to the right. No adjustments were made to image brightness or contrast. Image manipulations were performed in MATLAB (Mathworks, Natick, MA).
Before model training, images from 95% of participants were segregated into a training dataset. Images from the remaining 5% of participants were segregated into an independent test dataset. To prevent data leakage (e.g., intereye and intraeye correlations) between training and test datasets, multiple images acquired from a single participant appeared together in either the training or test dataset and were not split across both datasets. Data manipulations were performed in the Python programming language.
The reference grader (grader 1) and a second glaucoma fellowship-trained human grader (grader 2, B.Y.X.), both masked to participant identities, examination results, and original scleral spur locations, independently marked the scleral spur in all test dataset images. These locations were used to calculate metrics of intragrader and intergrader variability. Images reinspected as part of the test dataset could be further excluded owing to noise and artifacts that precluded attempts to mark the scleral spur by one or both of the graders.