Despite being trained on normal retina and plus disease fundal images only, the performance of the ROP.AI algorithm in detecting both pre-plus and plus disease was evaluated in an expanded external test set of 57 normal, 26 pre-plus, and 33 plus disease images. Using a default operating point threshold of 0.50, the algorithm provided a sensitivity and specificity of 81.4% and 80.7%, respectively. Overall accuracy, positive predictive value, and negative predictive value of 81.0%, 81.4%, and 80.7% was achieved.
The average outputs produced by the algorithm for normal, pre-plus, and plus disease images were 0.23, 0.65, and 0.93, respectively. The distribution of these probability outputs is illustrated in the violin plot in
Figure 4.
The violin plot shows the distribution of probability outputs produced by the algorithm for normal, pre-plus, and plus disease fundal images. The 25th, 50th, and 75th percentile outputs for normal, pre-plus, and plus disease fundal images are 0.002, 0.088, and 0.317; 0.387, 0.760, and 0.879; and 0.963, 1.00, and 1.00, respectively. The operating point optimized for high sensitivity is shown with the horizontal line at 0.38.