Open Access
Artificial Intelligence  |   May 2025
PhacoTrainer: Automatic Artificial Intelligence-Generated Performance Ratings for Cataract Surgery
Author Affiliations & Notes
  • Hsu-Hang Yeh
    Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
  • Simmi Sen
    Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
  • Jonathan C. Chou
    Department of Ophthalmology, Kaiser Permanente, San Francisco, CA, USA
  • Karen L. Christopher
    Department of Ophthalmology, University of Colorado School of Medicine, Aurora, CO, USA
  • Sophia Y. Wang
    Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
  • Correspondence: Sophia Wang, Department of Ophthalmology, Byers Eye Institute, Stanford University, 2370 Watson Court, Palo Alto, CA 94303, USA. e-mail: [email protected] 
Translational Vision Science & Technology May 2025, Vol.14, 2. doi:https://doi.org/10.1167/tvst.14.5.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hsu-Hang Yeh, Simmi Sen, Jonathan C. Chou, Karen L. Christopher, Sophia Y. Wang; PhacoTrainer: Automatic Artificial Intelligence-Generated Performance Ratings for Cataract Surgery. Trans. Vis. Sci. Tech. 2025;14(5):2. https://doi.org/10.1167/tvst.14.5.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To investigate whether cataract surgical skill performance metrics automatically generated by artificial intelligence (AI) models can differentiate between trainee and faculty surgeons and the correlation between AI metrics and expert-rated skills.

Methods: Routine cataract surgical videos from residents (N = 28) and attendings (N = 29) were collected. Three video-level metrics were generated by deep learning models: phacoemulsification probe decentration, eye decentration, and zoom level change. Three types of instrument- and landmark- specific metrics were generated for the limbus, pupil, and various surgical instruments: total path length, maximum velocity, and area. Expert human judges assessed the surgical videos using the Objective Structured Assessment of Cataract Surgical Skill (OSACSS). Statistical differences between AI and human-rated scores between attending surgeons and trainees were assessed using t-tests, and the correlations between them were examined by Pearson correlation coefficients.

Results: The phacoemulsification probe had significantly lower total path lengths, maximum velocities, and area metrics in attending videos. Attending surgeons demonstrated better phacoemulsification centration and eye centration. Most AI metrics negatively correlated with OSACSS scores, including phacoemulsification decentration (r = −0.369) and eye decentration (r = −0.394). OSACSS subitems related to eye centration and different steps of surgery also exhibited significant negative correlations with corresponding AI metrics (r ranging from −0.77 to −0.49).

Conclusions: Automatically generated AI metrics can be used to differentiate between attending and trainee surgeries and correlate with the human expert evaluation on surgical performance.

Translational Relevance: AI-generated useful metrics that correlate with surgeon skill may be useful for improving cataract surgical education.

Introduction
Cataracts are the leading cause of blindness worldwide.1 Cataract surgery is one of the most commonly performed procedures in the developed world.2 Surgical feedback during ophthalmology residency is critical for learning cataract surgery and improving surgical technique.3 Surgical feedback is typically delivered while operating with an attending preceptor, although previous work has suggested that trainees also desire more frequent and structured feedback of their surgical procedures.3 Structured rating scales for cataract surgical skill have been developed, including the Objective Structured Assessment of Cataract Surgical Skill(OSACSS).4 OSACSS consists of 20 components which rate global and phacoemulsification task-specific elements. Although OSACSS is a structured evaluation method that has been shown to be effective for differentiating between surgeons of varying experience,4 it is difficult for expert raters to provide OSACSS scores on a large collection of surgical video recordings because it is a time-consuming process. 
Previous studies have demonstrated that automatic surgical performance ratings calculated using motion tracking algorithms correlate with OSACSS ratings and can distinguish between novice and expert surgeons.57 However, previous efforts have not accounted for variation across different surgical phases or tools, thereby providing only an overall rating for the entire procedure and limiting the granularity of the feedback. Our previous work has developed novel deep learning approaches for recognizing the steps of cataract surgery in each frame of video and identifying important items such as surgical instruments and eye anatomical landmarks.8 In particular, our previously developed segmentation model has the ability to detect the overall location and the tooltip location of surgical instruments, including needles/cannulas, blades, forceps, the phacoemulsification probe. It also detects the location of the pupil and limbus and calculates the pupil center with high accuracy, achieving a prediction error of less than six pixels on average. The ability to automatically detect these important landmarks quickly and accurately on entire cataract surgical videos enables the ability to calculate tool-specific artificial intelligence (AI)-derived metrics related to performance, such as the total path length and area covered of individual tools, the velocity of tools, and the decentration of the video, among others. 
The purpose of this study is to investigate whether AI-derived cataract surgery performance metrics can distinguish between trainee and attending cataract surgical videos, and to evaluate the correlation between these AI-derived cataract surgery performance metrics and surgical performance as formally rated using OSACSS. The success of AI-derived cataract surgery performance metrics can lead to an end goal of delivering quick and comprehensive surgical feedback and assisting surgical training of ophthalmology residents. 
Methods
Data Source
Routinely recorded cataract surgical videos were collected from residents at seven institutes and from attending surgeons at 1 institute. Resident surgical videos were from the same dataset in our previous study,8 and the attending videos were newly collected for this study. Incomplete videos that did not record the entire surgery, surgeries with intraoperative complications, and cataract surgeries combined with other types of procedures were excluded. Attending surgeons were all board-certified ophthalmologists with advanced fellowship training in an anterior segment subspecialty, and resident videos were contributed from different stages of training, although because of the anonymous nature of video collection, details of individual videos are unavailable. Videos were down sampled to 456 × 256 in size and 30 frames per second as required for input into our previously developed deep learning models, which were based on YOLACT9 with a DarkNet10 backbone. The model was pretrained on Cataract Dataset for Image Segmentation11 and finetuned on more than 3000 cataract surgical images.8 
Expert Ratings
A group of 3 independent board-certified ophthalmologist reviewers manually graded all surgical videos using OSACSS rubric,4 which includes 20 items rating individual surgical steps and overall performance. Each item in OSACSS score is a Likert-scale rating ranging from 1 to 5, with 5 being the best performance. The order of the videos was randomly shuffled before presenting to the raters, and the raters were masked to whether the videos came from attendings or trainees to avoid rating bias. Raters were all experienced attending preceptors for trainee surgeries at their institutions, with post-residency fellowship training focused in anterior segment surgery and total personal surgical volume in the thousands. 
Artificial Intelligence Automated Performance Metrics
Cataract surgical video was input into previously developed and validated AI model and landmark identification algorithm, which identified the positions of the instrument tips and the pupil center in each frame from the segmentation masks generated by the deep learning segmentation model.8 The tools included were: blade, forceps, needle or cannula, phacoemulsification probe, second instrument, irrigation/aspiration handpiece, lens injector, and Weck-Cel sponge (a tool for checking wound leakage with a highly absorbent cellulose tip and a plastic handle). To reduce false-negative prediction, a landmark prediction that was greater than 50 pixels away from the prediction in the prior frame or a null prediction were replaced with the average location of a maximum of the 15 most recent successful predictions. When null predictions continued for more than 15 frames, these were regarded as true negatives and no cache values were used onwards. After obtaining the positions of tools and pupil in each frame, the following six metrics were calculated (illustrated in Fig. 1): (1) total path length (pixel): the cumulative length that the tool tip has moved during the entire surgery; (2) maximum velocity (pixel/frame): the maximal length that the tool tip moved in one frame; (3) area covered (%): the percentage of the screen that has been passed by the tool tip at some time point during the surgery; (4) phacoemulsification probe decentration (pixel): average distance from the phacoemulsification tip to the center of pupil; (5) eye decentration(pixel): the average distance of pupil center to the screen center; (6) zoom level change (pixel): standard deviation of limbus diameter, measuring the variation of zoom level. Because the pixel distance between two objects with the same actual distance can vary with zoom level, we normalized all metrics except area covered using the average limbus diameter of the entire video, which was chosen as a proxy for the average zoom level. Because we have expanded our previously validated dataset to include new videos of attending surgeons, we performed additional validation on these new videos to confirm accuracy of the AI-generated metrics. Ten video clips of one-minute length were randomly selected from the attending videos. A total of 16,353 individual video frames were labeled from the selected clips, and the metrics calculated from the true positions were compared to the metrics generated from our pipeline. Pearson correlation coefficients were calculated between the predicted and the true metrics of different tools and the pupil center. Our analysis demonstrates strong correlations in the area, total path length, and max velocity metrics(Pearson correlation coefficient 0.988, 0.957, 0.769, respectively) (Supplementary Fig. S1). 
Figure 1.
 
Illustration of automatic metric calculation from landmark position and the shape of limbus Landmark positions (e.g., centers of limbus/pupil, tips of surgical instruments) were used to calculate (a) total path length (sum of the movement of the landmark), (b) max velocity (the largest movement change per frame), (c) covered area (proportion of screen that has been traversed by the landmark), (d) phacoemulsification probe decentration (distance between phacoemulsification tip and pupil center), (e) eye decentration (distance between pupil center to screen center), and (f) zoom level change (standard deviation of the diameter of limbus).
Figure 1.
 
Illustration of automatic metric calculation from landmark position and the shape of limbus Landmark positions (e.g., centers of limbus/pupil, tips of surgical instruments) were used to calculate (a) total path length (sum of the movement of the landmark), (b) max velocity (the largest movement change per frame), (c) covered area (proportion of screen that has been traversed by the landmark), (d) phacoemulsification probe decentration (distance between phacoemulsification tip and pupil center), (e) eye decentration (distance between pupil center to screen center), and (f) zoom level change (standard deviation of the diameter of limbus).
Statistical Analyses
Score differences between attending and trainee surgeons in both AI metrics and human OSACSS ratings were evaluated using independent t-tests conducted using the Python library SciPy version 1.10.1. Statistical significance was reported when p value was less than 0.05. For each type of metric, we applied the Bonferroni correction to correct for multiple comparisons. Pearson correlation coefficients between the OSACSS scores and AI metrics were compared in two fashions: (1) individual AI metrics were correlated with average overall OSACSS scores across all raters; (2) individual AI metrics were correlated with specific OSACSS item scores which were measuring the same skill area. We analyzed six pairs of AI metrics and OSACSS items that measure approximately the same concept/skill for the latter comparison: (1) AI eye decentration metric correlated with Eye Positioned Centrally Within Microscope View, and (2) area covered for the phacoemulsification probe tip correlated with Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye. (3) AI area metric for the blade correlated with OSACSS item: Incision and Paracentesis: Formation and Technique, (4) AI total path length metric for the second instrument correlated with OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments, (5) AI covered total path length metric for the phacoemulsification probe correlated with OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments, and (6) AI covered area metric for the irrigation/aspiration probe correlated with OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex. These comparisons between OSACSS items and AI metrics were made based on domain knowledge, because these OSACSS items covered the primary steps of surgery and were thought to potentially correlate well with the AI metrics associated with individual surgical instruments. 
Results
In total, 57 routine cataract surgical videos were included, among which 28 were from seven residents and 29 were from four attending surgeons. The median length of trainee videos was 25 minutes 31.5 seconds, and the median length of attending videos was 20 minutes 39.5 seconds. 
AI Metrics Differentiate Between Attending and Trainee Surgeons
Table 1 shows the difference in AI metrics between attending surgeons and trainees. The pupil, phacoemulsification probe had significantly lower total path lengths, maximum velocities, and areas covered in attending surgeons’ videos compared to trainee surgeons’ videos. Attending surgeons used second instruments with smaller maximum velocities, and used irrigation/aspiration handpiece with both smaller maximum velocities and smaller area covered. No significant difference in metrics were noted in blade, forceps, needle or cannula, lens injector, and Weck-Cel sponge between attending and trainees. Attending surgeons demonstrated less phacoemulsification probe decentration and eye decentration. No significant difference in zoom level change was noted between attending and resident videos. 
Table 1.
 
Average Difference Between Attending Surgeons and Trainees for Each AI Metric
Table 1.
 
Average Difference Between Attending Surgeons and Trainees for Each AI Metric
OSACSS Ratings Differentiate Between Attending and Trainee Surgeons
The average OSACSS ratings of attending videos was 91.3, whereas the average of the resident videos was 87.6, which showed significant difference (P = 0.009). Attending surgeons scored significantly higher than residents in eight OSACSS score categories. The items showing the highest difference were Eye Positioned Centrally Within Microscope View, Overall Speed and Fluidity of Procedure, and Capsulorrhexis: Commencement of Flap, with attendings scoring on average 0.61 (P < 0.001), 0.56 (P = 0.004), and 0.43 (P = 0.020) more points, respectively. Residents scored higher in Capsulorrhexis: Formation and Circular Completion (0.07, P = 0.667) and Capsule: Protection of Anterior and Posterior Capsules (0.02, P = 0.804), but neither reached statistical significance. Supplementary Table S1 shows the score differences between attendings and trainees for each OSACSS item. 
Correlation Between AI Metrics and Human Ratings
Next, we examined how expert-rated overall OSACSS scores correlated with AI-generated surgical skill metrics. Table 2 summarizes the correlations between different AI metrics and average overall OSACSS scores across all raters. The total path length, maximum velocity, and covered area metrics for the pupil, second instrument, and irrigation/aspiration probe all showed significant negative correlations with overall OSACSS scores. Negative correlations were also observed in the total path length and area covered metrics for limbus, blades, forceps, needle/cannula, phacoemulsification probe, and Weck-Cel sponge. The correlation of different tool metrics with the overall OSACSS scores are shown in scatterplots in Supplementary Figures S2 through S4. Phacoemulsification probe decentration and eye decentration were negatively correlated with overall OSACSS scores, but the zoom level change was not. 
Table 2.
 
Pearson Correlation Coefficient Between AI Metrics and Overall OSACSS Score
Table 2.
 
Pearson Correlation Coefficient Between AI Metrics and Overall OSACSS Score
Figure 2 shows the correlation between 6 pairs of specific AI metrics and their corresponding individual OSACSS ratings. All demonstrated a significant negative correlation as expected. The eye decentration metric had a −0.65 (P < 0.001) correlation with the OSACSS item: Eye Positioned Centrally Within Microscope View. The area metric for the phacoemulsification tip had a −0.64 (P < 0.001) correlation with the OSACSS item: Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye. The blade area metric had a −0.64 (P < 0.001) correlation with the OSACSS item: Incision and Paracentesis: Formation and Technique. The second instrument total path length metric had a −0.74 (P < 0.001) correlation with OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. The total path length phacoemulsification probe metric had a −0.77 (P < 0.001) correlation with OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. The irrigation/aspiration probe area metric had a −0.49 (P < 0.001) correlation with OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex. 
Figure 2.
 
Scatterplots of AI metrics and the corresponding OSACSS items. (a) AI-derived eye decentration metric plotted against OSACSS item: Eye Positioned Centrally Within Microscope View, showing negative correlation (P = −0.65). (b) AI-derived covered area metric for the phacoemulsification probe plotted against OSACSS item: Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye, showing negative correlation (P = −0.64). (c) AI-derived covered area metric for the blade plotted against OSACSS item: Incision and Paracentesis: Formation and Technique. (d) AI-derived covered total path length metric for the second instrument plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (e) AI-derived covered total path length metric for the phacoemulsification probe plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (f) AI-derived covered area metric for the irrigation/aspiration probe plotted against OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex.
Figure 2.
 
Scatterplots of AI metrics and the corresponding OSACSS items. (a) AI-derived eye decentration metric plotted against OSACSS item: Eye Positioned Centrally Within Microscope View, showing negative correlation (P = −0.65). (b) AI-derived covered area metric for the phacoemulsification probe plotted against OSACSS item: Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye, showing negative correlation (P = −0.64). (c) AI-derived covered area metric for the blade plotted against OSACSS item: Incision and Paracentesis: Formation and Technique. (d) AI-derived covered total path length metric for the second instrument plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (e) AI-derived covered total path length metric for the phacoemulsification probe plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (f) AI-derived covered area metric for the irrigation/aspiration probe plotted against OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex.
Discussion
In this study, we used AI and computer vision to generate a variety of metrics based on detection of the location of surgical instruments during cataract surgery and eye anatomical landmarks. We demonstrated the efficacy of these AI-generated metrics for characterizing cataract surgical performance. We showed that many AI metrics were significantly different between attending and trainee surgeons and that these metrics were correlated with human expert evaluation of surgical performance. AI-generated surgical skill metrics can be automatically generated for surgical videos in a fast and scalable way, enabling surgical trainees to receive an additional modality of useful feedback. In addition, the numerical values of the metrics can be logged and reviewed later to track the improvement in different facets of surgical skills. 
We observed significant differences in total path length of several tools and landmarks of the eye, where attending surgeons had a shorter path length than their trainee counterparts. Similar findings have been shown in surgical simulators,12 in wet labs,13 and video analysis of cataract surgery.6,7 This indicates attending surgeons complete the surgical steps with more efficient movements. In contrast to previous studies that used a single summary path length for the entire video, our methods generate tool-specific path lengths that enable more granular evaluation of tool usage patterns. This allows users to isolate the specific portions of the surgery that exhibit worse performance, enabling more targeted and efficient practice. Among the tools used in cataract surgery, we found the total path lengths of the phacoemulsification probe best differentiated between experienced and trainee surgeons. These findings are consistent with the findings in a single-resident study of the learning curve in cataract surgery, where the main incision and phacoemulsification were the phases that demonstrated the greatest improvement through practice over time.14 
The correlations we found between AI metrics with overall and individual OSACSS scores emphasizes how well computer-generated metrics align with human evaluation. Similar findings between total path length and OSACSS scores were reported using PhacoTrack motion capture software.5 Metrics related to the phacoemulsification probe, which is the most essential instrument for cataract surgery, showed the second greatest correlation with overall OSACSS scores among all instruments. This motivates using AI metrics for phacoemulsification probe as a supplementary tool for evaluating surgical proficiency. In addition to the total path length, the phacoemulsification decentration metric, which offers clues about how far the probe was moved with respect to the pupil center, also revealed that attending surgeons kept the tip of the phacoemulsification probe relatively closer to the pupil center. This might indicate the attending surgeons completed phacoemulsification more efficiently and in a generally safer, well-centered position. In addition to correlation with overall OSACSS scores, certain tool-specific metrics also showed high correlation with individual OSACSS subitems measuring performance on surgical steps which rely on those tools, for example, blade metrics with wound creation, second instrument and phacoemulsification probe metrics with phacoemulsification performance, irrigation and aspiration probe metrics with irrigation and aspiration performance. These correlations further support the potential of AI metrics to give an indication of cataract surgical performance, both overall and in a step-specific manner. 
Our study also showed that that attending surgeons have better control of the microscope view. Attending surgeons scored higher than residents on the OSACSS item Eye Positioned Centrally Within Microscope View. In addition, eye decentration metrics correlated negatively with the overall OSACSS performance, and decentration of the microscope significantly differed between attending and trainee surgeons, consistent with the previous study by Din et al.5 This indicates eye decentration might have more validity than zoom level change as a complementary tool for surgical evaluation, as the latter might also be affected by other factors such as distortion or rotation of the eyes. The AI-generated eye decentration metric successfully captures the level of decentration, demonstrating negative correlation with the OSACSS item for microscope decentration (shown in Fig. 2). An additional advantage of using the eye decentration metric is that it is relatively unaffected by the total time of surgery because the metric represents an average value across the surgery, whereas other metrics such as the total instrument path length would generally increase with the time of surgery. 
In the future, the granular tool- and landmark-specific metrics provided by this pipeline could serve as part of a feedback system for cataract surgical trainees. Based on this feedback, trainees can focus on improving the use of specific tools whose metrics were unfavorable, and they could also then track the improvement of these metrics over time as they performed more surgeries. If enough such AI metrics were collected from a wide variety of trainees, then trainees could compare their own particular performance to others at their level of training to further identify specific areas in which they could improve. Future studies should assess how providing these detailed AI metrics to trainees affects their surgical learning trajectories and patient outcomes, potentially in a randomized clinical trial. 
There are several limitations of this study. Attending videos were collected from a single center, which may limit the generalizability of the results. In addition, the model does not necessarily provide evaluation metrics for every individual step in cataract surgery, especially as some tools appear in several different steps and thus their motion metrics are aggregated across the entire video. For example, the performance of capsulorrhexis cannot currently be isolated due to the segmentation model not distinguishing between very similar-appearing tools such as the cystotome and the hydrodissection cannula. Combining models that recognize the current surgical phase can be evaluated in future studies. Similarly, many surgical steps utilize multiple instruments (e.g., phacoemulsification probe and second instrument). Thus, surgical skill is related to the motion of multiple instruments and therefore development of metrics that account for the interaction between multiple instruments could potentially enhance the correlation with surgical performance. Finally, the current model was not validated in cataract surgeries with complications such as iris prolapse or anterior vitrectomy, as these complications are relatively rare. A larger corpus of videos encompassing different situations may be helpful before deployment of the model in the clinical setting. 
To conclude, this study demonstrated that automatically generated AI-metrics can represent meaningful information on cataract surgical performance, distinguishing between trainee and attending surgeries and correlating with formal human-expert evaluations of surgical skill. A system which automatically generates surgical feedback metrics on cataract surgical video can be beneficial to the ophthalmology community as a complementary tool for surgical education. 
Acknowledgments
Supported by the National Eye Institute K23EY03263501(SYW); Career Development Award from Research to Prevent Blindness (SYW); unrestricted departmental grant from Research to Prevent Blindness (SYW); departmental grant National Eye Institute P30-EY026877 (SYW); Stanford Teaching and Mentoring Academy Innovation Grant (SYW). 
Disclosure: H.-H. Yeh, None; S. Sen, None; J.C. Chou, None; K.L. Christopher, None; S.Y. Wang, None 
References
Hashemi H, Pakzad R, Yekta A, et al. Global and regional prevalence of age-related cataract: a comprehensive systematic review and meta-analysis. Eye (Lond). 2020; 34: 1357–1370. [CrossRef] [PubMed]
Allen D, Vasavada A. Cataract and surgery for cataract. BMJ. 2006; 333(7559): 128–132. [CrossRef] [PubMed]
Saedon H. An analysis of ophthalmology trainees' perceptions of feedback for cataract surgery training. Clin Ophthalmol. 2014; 8: 43–47. [PubMed]
Saleh GM, Gauba V, Mitra A, et al. Objective structured assessment of cataract surgical skill. Arch Ophthalmol. 2007; 125: 363–366. [CrossRef] [PubMed]
Din N, Smith P, Emeriewen K, et al. Man versus machine: software training for surgeons-an objective evaluation of human and computer-based training tools for cataract surgical performance. J Ophthalmol. 2016; 2016: 3548039. [CrossRef] [PubMed]
Smith P, Tang L, Balntas V, et al. “PhacoTracking”: an evolving paradigm in ophthalmic surgical training. JAMA Ophthalmol. 2013; 131: 659–661. [CrossRef] [PubMed]
Balal S, Smith P, Bader T, et al. Computer analysis of individual cataract surgery segments in the operating room. Eye. 2019; 33: 313–319. [CrossRef] [PubMed]
Yeh HH, Jain AM, Fox O, et al. PhacoTrainer: deep learning for cataract surgical videos to track surgical tools. Transl Vis Sci Technol. 2023; 12(3): 23. [CrossRef] [PubMed]
Bolya D, Zhou C, Xiao F, Lee YJ. Yolact: real-time instance segmentation. Proc IEEE/CVF Int Conference Comput Vision. 2019: 9157–9166.
Redmon J. Darknet: open source neural networks in C. Available at https://pjreddie.com/darknet/. Accessed April 24, 2025.
Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, et al. CaDIS: cataract dataset for surgical RGB-image segmentation. 2021; 71: 102053.
Thomsen AS, Smith P, Subhi Y, et al. High correlation between performance on a virtual-reality simulator and real-life cataract surgery. Acta Ophthalmol. 2017; 95: 307–311. [CrossRef] [PubMed]
Saleh GM, Gauba V, Sim D, et al. Motion analysis as a tool for the evaluation of oculoplastic surgical skill: evaluation of oculoplastic surgical skill. Arch Ophthalmol. 2008; 126: 213–216. [CrossRef] [PubMed]
Balas M, Kwok JM, Miguel A, et al. The cataract surgery learning curve: quantitatively tracking a single resident's operative actions throughout their training. Am J Ophthalmol. 2023; 249: 82–89. [CrossRef] [PubMed]
Figure 1.
 
Illustration of automatic metric calculation from landmark position and the shape of limbus Landmark positions (e.g., centers of limbus/pupil, tips of surgical instruments) were used to calculate (a) total path length (sum of the movement of the landmark), (b) max velocity (the largest movement change per frame), (c) covered area (proportion of screen that has been traversed by the landmark), (d) phacoemulsification probe decentration (distance between phacoemulsification tip and pupil center), (e) eye decentration (distance between pupil center to screen center), and (f) zoom level change (standard deviation of the diameter of limbus).
Figure 1.
 
Illustration of automatic metric calculation from landmark position and the shape of limbus Landmark positions (e.g., centers of limbus/pupil, tips of surgical instruments) were used to calculate (a) total path length (sum of the movement of the landmark), (b) max velocity (the largest movement change per frame), (c) covered area (proportion of screen that has been traversed by the landmark), (d) phacoemulsification probe decentration (distance between phacoemulsification tip and pupil center), (e) eye decentration (distance between pupil center to screen center), and (f) zoom level change (standard deviation of the diameter of limbus).
Figure 2.
 
Scatterplots of AI metrics and the corresponding OSACSS items. (a) AI-derived eye decentration metric plotted against OSACSS item: Eye Positioned Centrally Within Microscope View, showing negative correlation (P = −0.65). (b) AI-derived covered area metric for the phacoemulsification probe plotted against OSACSS item: Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye, showing negative correlation (P = −0.64). (c) AI-derived covered area metric for the blade plotted against OSACSS item: Incision and Paracentesis: Formation and Technique. (d) AI-derived covered total path length metric for the second instrument plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (e) AI-derived covered total path length metric for the phacoemulsification probe plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (f) AI-derived covered area metric for the irrigation/aspiration probe plotted against OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex.
Figure 2.
 
Scatterplots of AI metrics and the corresponding OSACSS items. (a) AI-derived eye decentration metric plotted against OSACSS item: Eye Positioned Centrally Within Microscope View, showing negative correlation (P = −0.65). (b) AI-derived covered area metric for the phacoemulsification probe plotted against OSACSS item: Phacoemulsification Probe and Second Instrument: Effective Use and Stability Within Eye, showing negative correlation (P = −0.64). (c) AI-derived covered area metric for the blade plotted against OSACSS item: Incision and Paracentesis: Formation and Technique. (d) AI-derived covered total path length metric for the second instrument plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (e) AI-derived covered total path length metric for the phacoemulsification probe plotted against OSACSS item: Nucleus: Cracking or Chopping with Safe Phacoemulsification of Segments. (f) AI-derived covered area metric for the irrigation/aspiration probe plotted against OSACSS item: Irrigation and Aspiration Technique With Adequate Removal of Cortex.
Table 1.
 
Average Difference Between Attending Surgeons and Trainees for Each AI Metric
Table 1.
 
Average Difference Between Attending Surgeons and Trainees for Each AI Metric
Table 2.
 
Pearson Correlation Coefficient Between AI Metrics and Overall OSACSS Score
Table 2.
 
Pearson Correlation Coefficient Between AI Metrics and Overall OSACSS Score
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×