Cataract surgical video was input into previously developed and validated AI model and landmark identification algorithm, which identified the positions of the instrument tips and the pupil center in each frame from the segmentation masks generated by the deep learning segmentation model.
8 The tools included were: blade, forceps, needle or cannula, phacoemulsification probe, second instrument, irrigation/aspiration handpiece, lens injector, and Weck-Cel sponge (a tool for checking wound leakage with a highly absorbent cellulose tip and a plastic handle). To reduce false-negative prediction, a landmark prediction that was greater than 50 pixels away from the prediction in the prior frame or a null prediction were replaced with the average location of a maximum of the 15 most recent successful predictions. When null predictions continued for more than 15 frames, these were regarded as true negatives and no cache values were used onwards. After obtaining the positions of tools and pupil in each frame, the following six metrics were calculated (illustrated in
Fig. 1): (1) total path length (pixel): the cumulative length that the tool tip has moved during the entire surgery; (2) maximum velocity (pixel/frame): the maximal length that the tool tip moved in one frame; (3) area covered (%): the percentage of the screen that has been passed by the tool tip at some time point during the surgery; (4) phacoemulsification probe decentration (pixel): average distance from the phacoemulsification tip to the center of pupil; (5) eye decentration(pixel): the average distance of pupil center to the screen center; (6) zoom level change (pixel): standard deviation of limbus diameter, measuring the variation of zoom level. Because the pixel distance between two objects with the same actual distance can vary with zoom level, we normalized all metrics except area covered using the average limbus diameter of the entire video, which was chosen as a proxy for the average zoom level. Because we have expanded our previously validated dataset to include new videos of attending surgeons, we performed additional validation on these new videos to confirm accuracy of the AI-generated metrics. Ten video clips of one-minute length were randomly selected from the attending videos. A total of 16,353 individual video frames were labeled from the selected clips, and the metrics calculated from the true positions were compared to the metrics generated from our pipeline. Pearson correlation coefficients were calculated between the predicted and the true metrics of different tools and the pupil center. Our analysis demonstrates strong correlations in the area, total path length, and max velocity metrics(Pearson correlation coefficient 0.988, 0.957, 0.769, respectively) (
Supplementary Fig. S1).