Our cohort included both prospective and retrospective subjects who underwent UBM imaging at participating institutions between December 2014 and December 2019. Prospective subjects were previously consented and enrolled in the multicenter image database, the Pediatric Anterior Segment Imaging and Innovation Study (PASIIS; Baltimore, MD). PASIIS is a collaborative program between the University of Maryland and Children's National Medical Center designed to apply advances in technology and image analysis specifically to clinical evaluation and management of pediatric anterior segment disease. Retrospective subjects were included after review of image database and chart review.
Subject age at the time of examination ranged from 3 weeks to age 89 years (median age of 4.6 years, range = 3 weeks to 90 years) (
Table 1). UBM images were obtained using the Aviso Ultrasound Platform A/B UBM with 50 MHz linear transducer (Quantel Medical, Bozeman, MT) or the Accutome UBM Plus Platform with 48 MHz linear transducer (Keeler Accutome, Inc., Malvern, PA). Forty-six of 285 images were collected on the Accutome platform and the remaining 239 of 285 images were collected on the Aviso platform. Lens status composition and resolutions, as well as representative images for each device can be found in
Supplementary Tables S1 and
S2 and
Figures S1 and
S2.
Complete UBM image databases of adult images and pediatric images were reviewed from participating institutions for retrospective inclusion. Inclusion criteria included availability of clinical history and central axial UBM images. The only requirement for image quality was the lens (or area where the lens would typically be situated in the case of aphakia) was visible in the frame. UBM was performed by various operators, including ophthalmic photographers, trained technicians, attending physicians, and trainees. Subjects had undergone a variety of types of UBM imaging for various clinical indications (ranging from voluntary participation as a control subject, to clinical evaluation of lens position after cataract surgery, to unrelated evaluation of anterior segment pathology). Most subjects had imaging of both eyes performed. Several adult subjects were enrolled as controls after consent and compensation for time and travel. Images were captured in still image and video clip formats. For video clips, image stacks were exported and reviewed and appropriate images were included.
Most young children and some older subjects were imaged under general anesthesia concurrent with planned surgical procedure. Subjects imaged under general anesthesia were in supine position. The Alfonso eyelid speculum was used for eyelid opening and stabilization. Cotton tip applicators were used to position the globe when needed. Children and adults imaged while awake in an outpatient clinical setting received proparacaine anesthetic drops prior to imaging. Outpatients were imaged in supine or reclined position without eyelid speculum. For these awake subjects, eyelid opening and stabilization was achieved using cotton tipped applicators. Fixation targets and/or verbal instruction was used to position the globe when needed. Prior to imaging, a viscous ocular lubricant gel was applied to the ocular surface. The transducer probe was covered with a water-filled single-use ClearScan probe cover.
Eligible images were de-identified and reviewed by the principal investigator (J.L.A.) and trained clinical research coordinators (M.B. and A.V.). The probe location was determined from the image to be at or near the center of the cornea, and the pupil landmark in view. The direction of the marker and quality of the image were not factors considered for inclusion in this study, provided the pupil could be identified. Lens status was ascertained from chart review of clinical and surgical history.
We cropped the 285 raw images to exclude any text or labeling generated from the native UBM software while maximizing the ocular anatomy in the frame. We then partitioned our dataset into a training dataset that our model would learn from, a validation dataset, which we used to iteratively score our model and prevent overfitting, and a testing dataset that would remain unseen by the model's training process and could be used as an independent evaluation of a model's classification. We used random sampling without replacement to partition subjects, placing 20% of the total subjects in an independent test dataset, 20% of the remaining subjects in a validation dataset, and the subsequently remaining subjects in a training dataset. The overall proportion of pseudophakic, aphakic, and phakic subjects were maintained while partitioning, meaning that for all testing folds there were 1 to 2 aphakic subjects, 7 to 8 phakic subjects, and 4 to 5 pseudophakic subjects. We then balanced our training dataset by randomly oversampling the under-represented classes until there were an equal number of images for each lens status. The entire training set, including the oversampled images, underwent random augmentation that simulated real-world variance, such as horizontal flipping, a modest affine transformation, and contrast and brightness jittering. By selecting these transformations, the model can learn features that are independent of some user variance. These transformations helped mitigate the risk of oversampling leading to overfitting as there was a 6.25 * 10−6 chance that the same transformation would be applied to any image. Images in all datasets were uniformly resized to 108 pixels in height by 262 pixels in width and underwent normalization of pixel values in the range of −1 to 1. This final resolution was selected as it represented an approximate balance between the smallest resolution of the images in either height or width in order to prevent up-sampling, while still maintaining an aspect ratio representative of most images. These transformations were randomly re-applied to untransformed images every epoch.
Rather than build a model from scratch that might be prone to overfitting due to a limited sample size, we used a pretrained model, Densenet-121, and then fine-tuned the final layer's parameters and customized a classifier to classify our images. Densenet, a convolutional neural network architecture described by Huang et al.,
16 has the advantage of efficient accuracy returns for fewer parameters and memory allocation,
17 making it an ideal starting point for our model. The Densenet-121 model had been pretrained on ImageNet, a benchmark dataset containing over 14 million images with 1000 classes. As the target task of classifying the lens in UBM images differs from the source task of classifying ImageNet color images, we unfroze the further downstream dense block 4’s weight parameters while freezing all other earlier layer's parameters. Doing this allowed us to take advantage of the earlier frozen layers to retain general image feature recognition from Densenet and use the deeper layer parameters to classify lens status based on these extracted features.
18 The final, fully connected linear classification layer used a dropout rate of 0.6 before applying a log SoftMax function to generate the final likelihood of a certain lens class. The final model's unfrozen parameters and classification layer were trained using a stochastic gradient descent optimization at a learning rate of 0.0006 and a momentum of 0.9. We trained the model using a negative log-likelihood loss function for 60 epochs with a batch size of 32. We used early stopping criteria with a patience of 10 epochs to further mitigate overfitting.
We performed fivefold cross-validation to evaluate the performance of our model. The original dataset was randomly partitioned into five mutually exclusive testing datasets, and a model was trained on the remaining data for each fold. The composition of images by lens status within each fold is included in
Supplementary Figure S3. Each model's predicted labels for testing dataset were aggregated and used to generate a confusion matrix. From these values, we calculated a precision, recall, F1 score, and false positive rate for each lens status using formulas described in
Figure 1. Additionally, a receiver operating characteristic curve (ROC) was plotted for each fold and the mean of each fold's true and false positive rate in order to calculate the area under the curve (AUC) and SD. We calculated a weighted-average precision, recall, F1 scores, false positive rate, and AUC to describe the model's overall performance. Finally, we created heat-map visualizations using gradient-weighted class activation mapping to localize regions with high activation. These heatmaps were qualitatively analyzed to assess whether the classification model was activating image regions relevant to lens status evaluation.
In order to evaluate whether performance metrics were affected by patient-related factors within the pediatric subset, two experiments were performed. First, a model was trained and evaluated using the subgroup of all patients < 10 years old at the time of the (181 total images; 67 less phakic images and 37 less pseudophakic images and the same number of aphakic images as the all-ages group). The performance metrics and heatmaps of this under 10 subgroup model was compared to the performance of the model that used images from patients of all ages. In the second experiment, two models were trained using 20 subjects (11 phakic and 9 pseudophakic) from 2 conditions: patients < 10 years old at the time of the examination and patients > 10 years old at the time of the examination. These models were then evaluated on a test set of 8 subjects under age 10 (5 phakic and 3 pseudophakic) that had not been included in model training. Aphakic subjects were not included in the modeling, as there were no aphakic subjects over the age of 10 years and the goal was to compare the model's performance when the training set was restricted to patients only above the age of 10 years but tested on images from subjects under 10 years. Other than the parameters involving the number of classes and removing the cross-validation for evaluation, the modeling hyperparameters remained the same as described above.
This study adhered to the ethical principles outlined in the Declaration of Helsinki as amended in 2013. The Institutional Review Board has approved the above referenced protocol. Collection and evaluation of protected health information was compliant with the Health Insurance Portability and Accountability Act of 1996.