September 2024
Volume 13, Issue 9
Open Access
Artificial Intelligence  |   September 2024
Beyond PhacoTrainer: Deep Learning for Enhanced Trabecular Meshwork Detection in MIGS Videos
Author Affiliations & Notes
  • Su Kara
    Department of Ophthalmology, Stanford University, Palo Alto, CA, USA
  • Michael Yang
    Department of Ophthalmology, Stanford University, Palo Alto, CA, USA
  • Hsu-Hang Yeh
    Department of Ophthalmology, National Taiwan University, Taipei, Taiwan, Republic of China
  • Simmi Sen
    Department of Ophthalmology, Stanford University, Palo Alto, CA, USA
  • Hannah H. Hwang
    Weill Cornell School of Medicine, Cornell University, New York, NY, USA
  • Sophia Y. Wang
    Department of Ophthalmology, Stanford University, Palo Alto, CA, USA
  • Correspondence: Sophia Y. Wang, 2370 Watson Court, Palo Alto, CA 94303, USA. e-mail: sywang@stanford.edu 
Translational Vision Science & Technology September 2024, Vol.13, 5. doi:https://doi.org/10.1167/tvst.13.9.5
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Su Kara, Michael Yang, Hsu-Hang Yeh, Simmi Sen, Hannah H. Hwang, Sophia Y. Wang; Beyond PhacoTrainer: Deep Learning for Enhanced Trabecular Meshwork Detection in MIGS Videos. Trans. Vis. Sci. Tech. 2024;13(9):5. https://doi.org/10.1167/tvst.13.9.5.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The purpose of this study was to develop deep learning models for surgical video analysis, capable of identifying minimally invasive glaucoma surgery (MIGS) and locating the trabecular meshwork (TM).

Methods: For classification of surgical steps, we had 313 video files (265 for cataract surgery and 48 for MIGS procedures), and for TM segmentation, we had 1743 frames (1110 for TM and 633 for no TM). We used transfer learning to update a classification model pretrained to recognize standard cataract surgical steps, enabling it to also identify MIGS procedures. For TM localization, we developed three different models: U-Net, Y-Net, and Cascaded. Segmentation accuracy for TM was measured by calculating the average pixel error between the predicted and ground truth TM locations.

Results: Using transfer learning, we developed a model which achieved 87% accuracy for MIGS frame classification, with area under the receiver operating characteristic curve (AUROC) of 0.99. This model maintained a 79% accuracy for identifying 14 standard cataract surgery steps. The overall micro-averaged AUROC was 0.98. The U-Net model excelled in TM segmentation with an Intersection over union (IoU) score of 0.9988 and an average pixel error of 1.47.

Conclusions: Building on prior work developing computer vision models for cataract surgical video, we developed models that recognize MIGS procedures and precisely localize the TM with superior performance. Our work demonstrates the potential of transfer learning for extending our computer vision models to new surgeries without the need for extensive additional data collection.

Translational Relevance: Computer vision models in surgical videos can underpin the development of systems offering automated feedback for trainees, improving surgical training and patient care.

Introduction
Minimally invasive glaucoma surgery (MIGS) is a group of glaucoma surgeries aimed at treating glaucoma using less invasive techniques to reduce intraocular pressure (IOP). The MIGS procedures can be placed into three categories depending on their anatomic site of implantation: angle-based MIGS, subconjunctival MIGS, or suprachoroidal MIGS. Angle-based MIGS procedures often augment the pre-existing conventional outflow pathway of the eye by targeting specific iridocorneal angle structures, like the trabecular meshwork (TM), Schlemm's canal, and collector channels. Instead of creating additional sclerostomy wounds, angle-based MIGS procedures are often performed through clear corneal incisions created during standard phacoemulsification resulting in a faster recovery time and a lower risk of complications.1 In recent years, the popularity of MIGS procedures has surged, with many surgeons opting to perform MIGS procedures in conjunction with cataract surgery.2 In 2017, of the 174,788 glaucoma surgeries performed in the United States, 75.5% were MIGS procedures.3 
The iridocorneal structures cannot be directly visualized due to the total internal reflection of light that occurs at the air-tear film interface. To observe these structures clinically and intraoperatively, a gonioprism must be utilized. When ophthalmology residents and private practitioners were asked to rate their comfort level with 4-mirror gonioscopy, they rated it as the second most challenging examination skill, with an average score of 0.83 out of 4 (with a score of zero being the most challenging).4 Operating safely in the iridocorneal angle requires the surgeon to clearly distinguish between structures that are often highly variable in appearance and master the dexterity involved in delicate bimanual surgery. Misidentification of iridocorneal structures could lead to surgical complications like misplaced stents, cyclodialysis clefts, hyphema, and hypotony.5 Microsurgical training in the United States often begins in residency with a combination of didactic and hands-on experiences. However, despite the incorporation of MIGS procedures into the surgical curricula, a significant discrepancy remains between the training provided and the confidence in residents’ ability to perform these procedures independently: in a 2020 survey, 37% of program directors in the study expressed concerns regarding their residents’ MIGS experience, citing it as inadequate for independent MIGS procedures after graduation. Additionally, only 3% of the program directors were highly confident in their residents’ proficiency in performing MIGS procedures independently.6 These findings underscore the pressing need for improvements in MIGS training within ophthalmology residency programs. 
In our previous work, we developed the “PhacoTrainer” deep learning models, which were capable of identifying cataract surgical steps (create wound, injection into the eye, capsulorhexis, hydrodissection, phacoemulsification, irrigation/aspiration, place lens, remove viscoelastic, close wound, advanced technique/other, stain with trypan blue, manipulating iris, and subconjunctival injection) from entire surgical videos and important surgical instruments and eye anatomic landmarks. These models must be deployed on large collections of surgical videos, such as those collected by surgeons in training, thus forming the backbone of a system to develop automated surgical performance metrics through analysis of cataract surgical videos. Trainees could monitor their progress over time in a highly granular fashion, with statistics related to time spent on each step and tool motion metrics. As MIGS is an increasingly prevalent adjunct to cataract surgery, this study sought to extend the capabilities of our previous deep learning models for cataract surgical videos using transfer learning to integrate recognition of MIGS as a surgical step. Thus, MIGS procedures could be automatically detected from a large corpus of videos, eliminating the need for manual surgical logging. Metrics such as “time spent on MIGS” could be automatically captured and provided to the surgeon as a performance metric. In addition, we developed segmentation models which can precisely locate the TM in MIGS videos. Ultimately, these models can form the backbone of computer-assisted surgery or serve as valuable training aids for surgical education, allowing surgeons to practice and refine their techniques in locating and working with the TM during MIGS procedures. Through this enhanced proficiency, residents can become more skilled and confident in performing MIGS procedures, ultimately benefiting patients with glaucoma through improved surgical outcomes and care. 
Methods
Data Source and Annotation
Minimally Invasive Glaucoma Surgery Videos
A corpus of cataract surgical videos upon which the original PhacoTrainer7,8 models were trained, was augmented with an additional 48 surgical videos of MIGS procedures, 20 of which were collected in a de-identified manner at the Stanford Department of Ophthalmology and the remainder were obtained from publicly available videos (YouTube). MIGS procedures represented in these videos included Hydrus, Omni, iStent, and Kahook Dual Blade. A glaucoma surgeon labeled the start and end times for the MIGS procedures to identify the section of video where MIGS procedures were occurring. As only de-identified data were used, this study was deemed exempt from review by the Stanford Institutional Review Board. 
MIGS Sampled Frames for TM Segmentation
From the surgical videos, a total of 1743 video frames were sampled from the MIGS section of surgical video. The location of the TM in these frames was manually annotated by a glaucoma specialist on the Labelbox platform9 using line segments specified by three to eight individual points. The frames were annotated at 854 × 480 resolution to allow for clear human visualization of TM location while still reducing the size of the input image and limiting the computational resources needed to train subsequent segmentation models. Of these frames, 1110 contained a clear image of the TM, whereas 633 did not include a clear image of TM, for example, when the gonioscopic lens was in motion or the TM was out of frame. Frames which did not include clear images of the TM were included in the training process to enhance the model's utility when deployed on a whole surgical video, where may video frames would not contain clear images of the TM. Inclusion of “no TM” images, akin to “ungradable images” in other computer vision algorithms, ensures that these images do not lead to algorithmic failures during model deployment, preventing inappropriate segmentation of the TM when it is obscured or out of frame. 
Classification of MIGS: PhacoTrainer Transfer Learning Model
We investigated whether transfer learning could be leveraged from a model previously trained to recognize cataract surgical steps in order to further identify video frames where MIGS was occurring. The augmented corpus of videos was partitioned into a training dataset consisting of 234 video files (209 cataract surgeries and 25 MIGS procedures), and the remainder was reserved for validation and test sets. For input to the PhacoTrainer architecture, videos were downsampled to one frame per second, with the resolution downsampled to 456 × 256 and cropped to the central 256 × 256 portion. In total, there were 6096 video frames from MIGS procedures in the training corpus, with 430 and 501 in the validation and test sets, respectively. Due to the importance in this task for the model to learn to classify MIGS procedures effectively, a maximum of 6096 frames from each surgical step was randomly included in the training corpus to balance the data. 
For the transfer learning process, the first 18 layers of the PhacoTrainer VGG16 model were frozen, leaving 21,767,694 trainable parameters. The model was trained on MIGS video frames with a learning rate of 10−5, followed by training on MIGS frames again with a smaller learning rate of 10−6. We then trained with all frames in the train dataset with a learning rate of 10−6. Finally, all layers of the model were unfrozen and the model was trained on all frames with a learning rate of 10−6. We repeated this final step 3 times with successively smaller learning rates and concluded training with a learning rate of 10−7. The gradual decrease in learning rate during training (learning rate scheduling) and the process of freezing/unfreezing specific layers enabled us to preserve the model's prior knowledge from the original PhacoTrainer data, while adapting it to correctly label the new MIGS data. 
The code was executed with TensorFlow version 2.4.0, and Python version 3.7.8. For loss computation, sparse categorical cross entropy was used, and the output consisted of 15 SoftMax activation scores. The class with the highest score was designated as the final prediction. 
Identification of the Trabecular Meshwork With Segmentation Models
Segmentation Data Preprocessing
For consistency with previously published model architectures, images and their masks were resized to 512 × 288.10 As the TM and the annotated line segments are by their nature extremely thin and covering very few pixels in a frame, in order to improve prediction performance, we expanded the labeled area with a “cap” of a fixed pixel width above and below the labeled annotation. This was a tunable hyperparameter selected to be size 14. An example of a labeled image and the corresponding mask with cap size is shown in Figure 1
Figure 1.
 
Example labeled image of TM and corresponding mask for training. On the left, is a single representative frame from a MIGS procedure, where the TM location is labeled with a line segment. On the right, the corresponding mask is shown, with the background labeled +1 and the 14-pixel height cap above and below the original TM line labeled 0.
Figure 1.
 
Example labeled image of TM and corresponding mask for training. On the left, is a single representative frame from a MIGS procedure, where the TM location is labeled with a line segment. On the right, the corresponding mask is shown, with the background labeled +1 and the 14-pixel height cap above and below the original TM line labeled 0.
The training corpus was augmented using standard techniques, including zoom (±10%, 5%), shift (23 pixels left or right, and 13 pixels up or down), and horizontal flipping (left-right), resulting in an expanded training dataset of 18,512 images. 
For TM identification from MIGS procedure frames, three modeling approaches were developed: (1) U-Net, (2) Y-Net, and (3) Cascaded Classification and Segmentation U-Net. The models were trained using different dataset configurations. The original non-augmented frames were divided into three sets: a training dataset with 1323 frames (810 TM and 513 no TM), a validation dataset comprising 160 frames (100 TM and 60 no TM), and a test dataset of 160 frames (100 TM and 60 no TM). 
U-Net Segmentation Model
The U-net is a convolutional neural network designed for image segmentation.11 It consists of a contracting path and an expanding path, following an encoder-decoder architecture, as shown in Figure 2A. During the decoder path, skip connections are formed as feature maps from the encoder path are concatenated with feature maps at corresponding layers in the decoder path. Our architecture utilized sigmoid activation functions with a mean squared error loss. The model was trained with the Adam Optimizer with a learning rate of 10−3, a decay rate of 95, and a total of 40 epochs. 
Figure 2.
 
Model architectures for segmentation of trabecular meshwork. Panel (A) shows U-Net Architecture and panel (B) shows the Y-Net Architecture for segmentation of the trabecular meshwork from surgical video frames. Y-Net closely resembles the U-Net in part A, with a notable distinction: the classification branch extends through the bottom of the “U” structure. Panel (C) shows the cascaded classification and segmentation network. In this approach, a ResNet50V2 Transfer learning is first used to distinguish TM and no TM images; then, segmentation is performed using the U-Net architecture in part A, but exclusively on TM-labeled images.
Figure 2.
 
Model architectures for segmentation of trabecular meshwork. Panel (A) shows U-Net Architecture and panel (B) shows the Y-Net Architecture for segmentation of the trabecular meshwork from surgical video frames. Y-Net closely resembles the U-Net in part A, with a notable distinction: the classification branch extends through the bottom of the “U” structure. Panel (C) shows the cascaded classification and segmentation network. In this approach, a ResNet50V2 Transfer learning is first used to distinguish TM and no TM images; then, segmentation is performed using the U-Net architecture in part A, but exclusively on TM-labeled images.
Joint Classification and Segmentation
We also developed two other approaches which performed classification (TM presence/absence) and TM segmentation: Y-Net and Cascaded Classification and Segmentation U-Net. These approaches were developed to identify whether they could result in overall improved performance due to multi-task learning effects and/or reduced false positive results (segmenting TM where it was not visible) or false negative results. 
Similar to the U-Net architecture, Y-Net consists of an encoder-decoder architecture for segmentation,13 as shown in Figure 2B. However, it is distinguished by a classification branch after the completion of the contracting (encoder) layers, prior to proceeding to the expanding (decoder) layers. The task for the Y-Net's added classification branch was to distinguish between images with the TM present versus no TM present. The model uses two loss functions (both sparse categorical cross entropy) allowing it to optimize its parameters for both classification and segmentation simultaneously. We used a learning rate of 10−3, a decay rate of 95, and a total of 15 epochs. 
The cascaded classification and segmentation approach shown in Figure 2C utilized two separate models for performing classification of the presence of TM and segmentation of TM, to be used sequentially during inference. To build the classification model, we used transfer learning from ResNet50V2, pretrained on ImageNet.12 We used a learning rate of 10−4, decay rate of 90, and ran for 10 epochs. We then trained a U-Net model exclusively with “TM” frames. 
Evaluation Metrics and Testing
Cataract and MIGS Surgical Step Classification Metrics
To assess the PhacoTrainer Transfer Learning Model, we evaluated prediction accuracy on the test dataset, created receiver operating characteristic and precision-recall curves for multiclass classification and micro-averaged performance, and evaluated area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). 
TM Identification and Segmentation Evaluation
Accuracy of predicted segmentation for TM was measured by calculating the average pixel error between the predicted and ground truth TM locations. Figure 3A illustrates an example raw model output, which predicts a zone in the frame with a height of 29 pixels, the same as the annotation mask in Figure 1. The linear location of the TM is then identified by traversing the image column by column, locating the pixels at the vertical center of each column's predictions, which collectively forms the TM line (Fig. 3B). We then calculated the average pixel distance from the ground truth TM and the predicted TM line. Intersection over union (IoU) was also utilized as an additional segmentation performance metric. 
Figure 3.
 
Example model output for trabecular meshwork segmentation. (A) Predicted TM segment in dark blue in the background and the predicted TM line in light yellow in the foreground going through the center of the TM segment. (B) Ground-truth TM line in dark blue in the background, and the predicted TM line in light color in the foreground.
Figure 3.
 
Example model output for trabecular meshwork segmentation. (A) Predicted TM segment in dark blue in the background and the predicted TM line in light yellow in the foreground going through the center of the TM segment. (B) Ground-truth TM line in dark blue in the background, and the predicted TM line in light color in the foreground.
To evaluate models for their ability to classify whether or not TM was present in the video frame, we computed standard performance metrics for each model, including sensitivity (recall), specificity, precision, also known as positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score, which is the harmonic mean of PPV and sensitivity. We used the following standard definitions to calculate these metrics (true positive = TP, false positive = FP, true negative = TN, and false negative = FN):  
\begin{eqnarray*}Sensitivity\ \left( {Recall} \right)\ = \frac{{TP}}{{TP\ + \ FN}}\end{eqnarray*}
 
\begin{eqnarray*}{Specificity}\ = \frac{{TN}}{{TN\ + \ FP}}\end{eqnarray*}
 
\begin{eqnarray*}PPV\ \left( {Precision} \right)\ = \frac{{TP}}{{TP\ + \ FP}}\end{eqnarray*}
 
\begin{eqnarray*}{NPV}\ = \frac{{TN}}{{TN\ + \ FN}}\end{eqnarray*}
 
\begin{eqnarray*}Accuracy\ = \frac{{TP\ + \ TN}}{{TP\ + \ TN\ + \ FP\ + \ FN}}\end{eqnarray*}
 
\begin{eqnarray*}F1\ Score\ = 2 \cdot \frac{{Precision \cdot Recall}}{{Precision\ + \ Recall}}\end{eqnarray*}
 
For the classification components of the Y-Net and Cascaded models, we evaluated whether the model correctly classified whether the frame was a “TM” or “no TM” frame. For example, a TP image would be the case where the frame was a “TM” frame and the model correctly predicted this. For the U-Net, Y-Net, and Cascaded segmentation models, we considered a “positive” identification of TM when the model predicted any segment for TM, and “negative” otherwise. For example, a TP image would be the case where the model predicted a segment on the frame as the TM line and this was a “TM” frame. Because the Cascaded model performs segmentation only on images predicted to contain TM in the previous classification step, there were no “no TM” images in the segmentation component of the step, meaning there were no true negative predictions. 
The 95% confidence intervals (CIs) on performance metrics were determined using bootstrapping with 10000 replicates. 
Prediction and Evaluation of Whole Surgical Video
To further performance of the PhacoTrainer Transfer and TM Segmentation models, we also conducted evaluation on a held-out test videos containing both cataract surgery and MIGS procedures. In addition to the label for the MIGS section, the 13 specific steps of cataract surgery were also manually annotated. We followed the architecture described in Supplementary Figure S1 to apply the PhacoTrainer Transfer model to classify cataract surgical steps or MIGS procedures, and apply the TM segmentation model to draw the predicted TM curve on MIGS frames. 
Results
Performance of PhacoTrainer Transfer Learning Model
A transfer learning and fine-tuning approach was used to update the PhacoTrainer model, previously trained to identify standard cataract surgical steps, to additionally identify when MIGS was occurring in the surgical video, using the original cataract surgical videos and 25 additional MIGS procedure videos for training. The fine-tuned classification model achieved an accuracy of 0.874 (95% CI = 0.866–0.882) for identifying video frames with MIGS procedures on a test set while retaining a micro-averaged accuracy of 0.790 (95% CI = 0.789–0.791) for 14 standard cataract surgical steps. Receiver operating curves for identification of each step of surgery, including MIGS procedures and cataract surgical steps, are shown in Figure 4A. The overall micro-averaged AUROC was 0.98. 
Figure 4.
 
Receiver operating characteristic and precision recall curves for classification of cataract and MIGS surgical video frames. (A) Receiver operating characteristic (ROC) curves for recognizing cataract surgical steps and MIGS procedures using PhacoTrainer Transfer Learning Model. (B) Precision-recall (PR) curves for recognizing cataract surgical steps and MIGS using PhacoTrainer Transfer Learning Model.
Figure 4.
 
Receiver operating characteristic and precision recall curves for classification of cataract and MIGS surgical video frames. (A) Receiver operating characteristic (ROC) curves for recognizing cataract surgical steps and MIGS procedures using PhacoTrainer Transfer Learning Model. (B) Precision-recall (PR) curves for recognizing cataract surgical steps and MIGS using PhacoTrainer Transfer Learning Model.
The confusion matrix for the PhacoTrainer Transfer Learning Model is provided in Supplementary Figure S2. For most steps including MIGS procedures, the most common misclassification was “no label,” indicating that frames that were labeled as having MIGS procedures were predicted as “no label” or vice versa. For the most part, these misclassifications were due to the periods of transition between cataract surgery and MIGS procedure steps. 
Performance of Models for Identification and Segmentation of Trabecular Meshwork
We developed three models to identify the precise location of the TM in videos of MIGS: U-Net, Y-Net, and Cascaded Classification and Segmentation U-Net. The models were trained and tested on still frames extracted from MIGS procedure videos. The U-Net model performs only segmentation (identification of TM location), whereas the Y-Net and Cascaded Classification and Segmentation models perform both classification (indicating presence vs absence of TM) and segmentation of TM. These latter two combined classification and segmentation approaches were pursued to investigate whether these approaches resulted in better overall segmentation performance and ensure that segmentation of TM occurred only when it was visible and not otherwise. 
Trabecular Meshwork Segmentation Performance
All models were evaluated on test set for segmentation performance using mean IoU (Table 1) and average pixel error between the predicted TM location and the true TM location. The U-net achieved the highest IoU score (0.9988) and the lowest average pixel error of 1.47 (95% CI = 1.33–1.60), which on a frame size 512 × 288 corresponds to approximately 0.5% of the frame height. The Cascaded model had the next best performance with average pixel error of 2.91 (95% CI = 1.61–4.21), followed by the Y-Net model 5.55 (95% CI = 2.26–8.84). The U-Net approach resulted in only 2 false negative (FN) predictions where the model failed to segment a TM which was present on a test set of 160 frames. This can be compared to 11 FN for Y-Net and 3 FN for Cascade. The U-Net model had no false positive (FP) predictions, whereas Y-Net had 16 FP predictions and Cascade had 2 FP predictions. 
Table 1.
 
Mean Intersection-Over-Union Scores for Segmentation Models
Table 1.
 
Mean Intersection-Over-Union Scores for Segmentation Models
Classification of Trabecular Meshwork Presence Versus Absence
The Y-Net and Cascaded Classification and Segmentation models were also evaluated for classification performance (presence versus absence of TM). The U-Net model was exclusively designed for segmentation, so it did not have a specific classification branch. The Y-Net achieved a classification accuracy of 0.83 (95% CI = 0.78-0.89) with 9 FN predictions and 15 FP predictions over a test set of 160, whereas the Cascade model achieved a classification accuracy of 0.97 (95% CI = 0.94-1) with 3 FN predictions and 9 FP predictions. The complete set of classification performance metrics for all models are shown in Table 2
Table 2.
 
Classification Performance Metrics for the U-Net, Y-Net, and Cascade Models
Table 2.
 
Classification Performance Metrics for the U-Net, Y-Net, and Cascade Models
Example Classification and Segmentation on Whole MIGS Videos
The MIGS procedure classification and U-Net TM segmentation models were evaluated on a full length held-out test video which included both cataract surgery and MIGS procedures, as shown in Supplementary Video S3Figure 5 depicts the ground truth timeline of surgical steps compared to the predicted timeline of surgical steps, including cataract surgical steps and MIGS procedures. The overall accuracy for classification of surgical steps was 83%. Figure 5 also shows the timeline for when the model performs segmentation of TM, which illustrates when during the surgery the TM is visible. Notably, the surgical frames from the transition period between cataract extraction and MIGS procedures are especially challenging for the model to classify. 
Figure 5.
 
Video timeline showing ground truth and prediction comparison for all labels. The figure shows the timeline of surgical steps for a combined cataract and MIGS procedures. The ground truth indicates the timeline as determined by human review of the video, whereas the predictions show the surgical steps as predicted by the deep learning classification model. The timeline also illustrates when the trabecular meshwork (TM) is drawn by the segmentation model.
Figure 5.
 
Video timeline showing ground truth and prediction comparison for all labels. The figure shows the timeline of surgical steps for a combined cataract and MIGS procedures. The ground truth indicates the timeline as determined by human review of the video, whereas the predictions show the surgical steps as predicted by the deep learning classification model. The timeline also illustrates when the trabecular meshwork (TM) is drawn by the segmentation model.
Discussion
The results of our study provide valuable insights into the performance and potential of deep learning models in the context of cataract surgery and MIGS procedures. Using a transfer learning approach, we reached an 87% accuracy in identifying MIGS procedures in surgical video frames. In addition, we developed a deep learning model capable of precisely localizing the TM from with an average pixel error of 1.47 within a surgical frame. Our model not only enhances the training of MIGS procedures, revolutionizing surgical education, but also catalyzes a new era of advancement in deep learning within the field of ophthalmology. 
We were able to successfully leverage transfer learning and fine-tuning to expand the capabilities of a model previously trained to recognize 14 steps of standard cataract surgery, in order to also identify when MIGS procedures were occurring. Our transfer learning model achieved an 87% accuracy in classifying video frames containing MIGS surgery and an AUROC of 0.99. Additionally, we maintained an accuracy of 79% for identifying the original cataract surgical steps, which is higher than the 76% accuracy achieved in our previous PhacoTrainer paper using the same model. Whereas the original PhacoTrainer model was trained on hundreds of cataract surgical videos, updating this model to further identify MIGS procedures used only 25 additional videos for training. By leveraging the knowledge acquired during the pretraining phase on standard cataract surgical videos, our model exhibited remarkable predictive performance in identifying MIGS procedures, a related domain. This transfer of knowledge not only enhanced accuracy but also significantly mitigated the need for extensive additional data collection and annotation, a time-consuming and costly endeavor. This approach could potentially be used in the future to further enhance the classification model to identify additional related ocular surgical steps, such as adjunctive corneal or refractive procedures and others, while minimizing the need for additional data collection and annotation. 
Another significant achievement was the development of a segmentation model that was able to identify the location of the TM with exceptionally high accuracy, outperforming previous attempts at this task.10 Our best-performing model, a U-Net, predicted the location of the TM with an average pixel error of 1.47 on the 512 × 288 frame size, compared to previous efforts that achieved an average pixel error of 2.30.10 One challenge of segmenting a small linear structure, such as the TM, is that the background pixels far outnumber the pixels representing the TM. We thus used a strategy of vertically expanding the ground truth TM line to enhance the representation of TM-positive pixels, which significantly improved the model’s accuracy and precision in TM localization. Although there may be other approaches, such as pixel weighting and using the average distance from the TM as a loss function, these approaches may have difficulty overcoming the severe imbalance in the pixel representation of the structures. In addition, we were initially concerned that use of a segmentation model without classification of whether the MIGS procedure was being performed or not could result in false positive results, where the model could predict the location of the TM in frames where TM was not visible. This motivated several other modeling approaches combining classification (identification of TM presence) with segmentation. Surprisingly, we found that despite the absence of a preliminary classification model to distinguish between TM and no TM images, the U-Net was able to accurately identify and draw the TM line on all frames without any false positive results. Combining the additional task of learning to classify whether TM was present or not in the frame did not appear to improve segmentation performance. 
Our work has several potential implications for improving clinical care and surgical education. Artificial intelligence (AI) models which identify from surgical videos whether the MIGS procedure was performed could assist ophthalmologists, especially trainees who must maintain surgical logs, to automatically capture statistics about MIGS training, such as number of surgeries and length of time spent performing MIGS procedures in each surgery. Trainees could easily identify past MIGS procedures for further review and learning. Detailed segmentation of TM structures could eventually form the basis of computer-assisted surgical systems, which could, for example, alert the surgeon in real time when an optimal view of the TM is achieved and assist in targeting devices and interventions to the correct anatomic location, improving surgical outcomes. These models would also be a necessary component of more advanced robotic surgery systems in the future, which would need to be able to recognize the detailed structures of the angle in order to perform the precise intraocular manipulations required in MIGS procedures. 
We acknowledge that this study has several limitations. Transfer learning to expand the original PhacoTrainer model to recognize MIGS procedures was performed using relatively few videos depicting MIGS procedures. Although this could in one way be considered a strength, as we have demonstrated, that it requires very few videos of a new surgical step to expand the capabilities of a previously trained model, it could also be considered a weakness as there were limited numbers of surgical video examples of each individual type of MIGS surgery. However, the gonioscopy lens and angle of the microscope are shared across many angle-based MIGS procedures, so it is likely that other similar MIGS procedures would also be recognizable. Future work could focus on expanding the diversity of the types of MIGS procedures represented, and also expand the model for multiclass capabilities to recognize which type of MIGS procedure is being performed. In addition, it is possible that the entire diversity of TM pigmentation levels was not captured in the videos used for training, and that like humans, the model may have more difficulty identifying the TM in cases of very light pigmentation. In addition, these models were designed to process static frames, thereby omitting the potential insights offered by object motion within the videos. In addition, although one of the goals of this project was to utilize transfer learning from a previously trained model, we do acknowledge the limitations posed by the VGG16 architecture, including its age and extensive parameter count. Future investigations will focus on exploring contemporary architectures, such as EfficientNet or Xception, which may improve the performance of our system and optimize the use of computational resources. This endeavor would necessitate retraining the model comprehensively on an expanded dataset that includes both conventional cataract surgeries and novel MIGS procedures. 
In conclusion, accurate identification of iridocorneal structures with intraoperative gonioscopy is a challenging yet essential skill for accomplishing safe and effective angle-based MIGS procedures. The appearance of these structures can vary significantly among individuals which contributes to the difficulty in proper identification. As angle-based MIGS targeting the trabecular meshwork becomes increasingly prevalent, it is crucial to provide ophthalmological surgeons with tools that can augment their surgical training and accuracy. We were able to leverage transfer learning to develop an AI model that is able to accurately identify the portion of a surgical video during which a MIGS procedure is being performed. Additionally, our segmentation model can identify the precise location of the TM with high accuracy, improving on previously published work. Overall, we hope that our model can improve surgical training for MIGS procedures. 
Acknowledgments
Supported by National Eye Institute K23EY03263501(SYW); Career Development Award from Research to Prevent Blindness (SYW); unrestricted departmental grant from Research to Prevent Blindness (SYW and MY); departmental grant National Eye Institute P30-EY026877 (SYW and MY). 
Disclosure: S. Kara, None; M. Yang, None; H.-H. Yeh, None; S. Sen, None; H.H. Hwang, None; S.Y. Wang, None 
References
Gurnani B, Tripathy K. Minimally invasive glaucoma surgery. In: StatPearls. Tampa, FL: StatPearls Publishing; 2023.
Rathi S, Andrews CA, Greenfield DS, Stein JD. Trends in glaucoma surgeries performed by glaucoma subspecialists versus nonspecialists on Medicare beneficiaries from 2008-2016. Ophthalmology. 2021; 128(1): 30–38. [CrossRef] [PubMed]
Ma AK, Lee JH, Warren JL, Teng CC. Glaucomap – distribution of glaucoma surgical procedures in the United States. Clin Ophthalmol. 2020; 14: 2551–2560. [CrossRef] [PubMed]
Tejwani S, Murthy SI, Gadudadri CS, Thomas R, Nirmalan P. Impact of a month-long training program on the clinical skills of ophthalmology residents and practitioners. Indian J Ophthalmol. 2010; 58(4): 340–343. [CrossRef] [PubMed]
Kaplowitz K, Loewen NA. Minimally invasive glaucoma surgery: trabeculectomy Ab Interno. In: Samples JR, Ahmed IIK, eds. Surgical Innovations in Glaucoma. New York, NY: Springer; 2014: 175–188.
Yim CK, Teng CC, Warren JL, Tsai JC, Chadha N. Microinvasive glaucoma surgical training in united states ophthalmology residency programs. Clin Ophthalmol. 2020; 14: 1785–1789. [CrossRef] [PubMed]
Yeh HH, Jain AM, Fox O, Wang SY. PhacoTrainer: a multicenter study of deep learning for activity recognition in cataract surgical videos. Transl Vis Sci Technol. 2021; 10(13): 23. [CrossRef] [PubMed]
Yeh HH, Jain AM, Fox O, Sebov K, Wang SY. Phacotrainer: deep learning for cataract surgical videos to track surgical tools. Transl Vis Sci Technol. 2023; 12(3): 23. [CrossRef] [PubMed]
Labelbox, Online, 2023. Available at: https://labelbox.com.
Lin KY, Urban G, Yang MC, et al. Accurate identification of the trabecular meshwork under gonioscopic view in real time using deep learning. Ophthalmol Glaucoma. 2022; 5(4): 402–412. [CrossRef] [PubMed]
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. arXiv:1505.04597 [cs]. Published online May 18, 2015.
Deng J, Dong W, Socher R, Li LJ, Li Kai, Fei-Fei Li. ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009: 248–255.
Mehta S, Mercan E, Bartlett J, Weave D, Elmore JG, Shapiro L. Y-Net: joint segmentation and classification for diagnosis of breast biopsy images. arXiv Preprint. arXiv:1806.01313 [cs]. Published online June 4, 2018.
Figure 1.
 
Example labeled image of TM and corresponding mask for training. On the left, is a single representative frame from a MIGS procedure, where the TM location is labeled with a line segment. On the right, the corresponding mask is shown, with the background labeled +1 and the 14-pixel height cap above and below the original TM line labeled 0.
Figure 1.
 
Example labeled image of TM and corresponding mask for training. On the left, is a single representative frame from a MIGS procedure, where the TM location is labeled with a line segment. On the right, the corresponding mask is shown, with the background labeled +1 and the 14-pixel height cap above and below the original TM line labeled 0.
Figure 2.
 
Model architectures for segmentation of trabecular meshwork. Panel (A) shows U-Net Architecture and panel (B) shows the Y-Net Architecture for segmentation of the trabecular meshwork from surgical video frames. Y-Net closely resembles the U-Net in part A, with a notable distinction: the classification branch extends through the bottom of the “U” structure. Panel (C) shows the cascaded classification and segmentation network. In this approach, a ResNet50V2 Transfer learning is first used to distinguish TM and no TM images; then, segmentation is performed using the U-Net architecture in part A, but exclusively on TM-labeled images.
Figure 2.
 
Model architectures for segmentation of trabecular meshwork. Panel (A) shows U-Net Architecture and panel (B) shows the Y-Net Architecture for segmentation of the trabecular meshwork from surgical video frames. Y-Net closely resembles the U-Net in part A, with a notable distinction: the classification branch extends through the bottom of the “U” structure. Panel (C) shows the cascaded classification and segmentation network. In this approach, a ResNet50V2 Transfer learning is first used to distinguish TM and no TM images; then, segmentation is performed using the U-Net architecture in part A, but exclusively on TM-labeled images.
Figure 3.
 
Example model output for trabecular meshwork segmentation. (A) Predicted TM segment in dark blue in the background and the predicted TM line in light yellow in the foreground going through the center of the TM segment. (B) Ground-truth TM line in dark blue in the background, and the predicted TM line in light color in the foreground.
Figure 3.
 
Example model output for trabecular meshwork segmentation. (A) Predicted TM segment in dark blue in the background and the predicted TM line in light yellow in the foreground going through the center of the TM segment. (B) Ground-truth TM line in dark blue in the background, and the predicted TM line in light color in the foreground.
Figure 4.
 
Receiver operating characteristic and precision recall curves for classification of cataract and MIGS surgical video frames. (A) Receiver operating characteristic (ROC) curves for recognizing cataract surgical steps and MIGS procedures using PhacoTrainer Transfer Learning Model. (B) Precision-recall (PR) curves for recognizing cataract surgical steps and MIGS using PhacoTrainer Transfer Learning Model.
Figure 4.
 
Receiver operating characteristic and precision recall curves for classification of cataract and MIGS surgical video frames. (A) Receiver operating characteristic (ROC) curves for recognizing cataract surgical steps and MIGS procedures using PhacoTrainer Transfer Learning Model. (B) Precision-recall (PR) curves for recognizing cataract surgical steps and MIGS using PhacoTrainer Transfer Learning Model.
Figure 5.
 
Video timeline showing ground truth and prediction comparison for all labels. The figure shows the timeline of surgical steps for a combined cataract and MIGS procedures. The ground truth indicates the timeline as determined by human review of the video, whereas the predictions show the surgical steps as predicted by the deep learning classification model. The timeline also illustrates when the trabecular meshwork (TM) is drawn by the segmentation model.
Figure 5.
 
Video timeline showing ground truth and prediction comparison for all labels. The figure shows the timeline of surgical steps for a combined cataract and MIGS procedures. The ground truth indicates the timeline as determined by human review of the video, whereas the predictions show the surgical steps as predicted by the deep learning classification model. The timeline also illustrates when the trabecular meshwork (TM) is drawn by the segmentation model.
Table 1.
 
Mean Intersection-Over-Union Scores for Segmentation Models
Table 1.
 
Mean Intersection-Over-Union Scores for Segmentation Models
Table 2.
 
Classification Performance Metrics for the U-Net, Y-Net, and Cascade Models
Table 2.
 
Classification Performance Metrics for the U-Net, Y-Net, and Cascade Models
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×