The prediction performance of intraocular lens (IOL) formulas for cataract patients is usually evaluated with the following metrics: the mean prediction error (ME), the mean absolute error (MAE), median absolute error (MedAE), and standard deviation (SD) of the prediction error (PE), as recommended in multiple publications.
1–3 These are standard evaluation metrics commonly used for regression problems in which the target value is a scalar. The MAE summarizes the average distance between the prediction and the true value. The MedAE evaluates the median deviation and is less sensitive to outliers and extreme values. The standard deviation (SD) measures the extent of scattering of the PE. Aside from these standard metrics, ophthalmologists also calculate the percentage of PEs within a certain range (e.g., ± 0.25
D, ± 0.5
D), and the performance in different axial length (AL) groups (short, medium, and long). The former is a convenient way of investigating the distribution of PEs. The latter aids in determining whether a formula has consistent performance among myopic, hyperopic, and regular eyes. Recently, Hoffer and Savini
2 demonstrated a new evaluation metric, the IOL Formula Performance Index, which combines multiple metrics into one: (1) the SD, (2) the MedAE, (3) the AL bias, and (4) the percentage of eyes with PE within ± 0.5
D. Holladay et al.
4 reviewed IOL calculation evaluation metrics and recommended the SD as the single best measurement because SD allows the use of heteroscedastic statistical methods and SD predicts the percentage of cases within a given interval, the mean absolute error, and the median of the absolute errors. However, this conclusion was drawn based on the results of 11 optics-based IOL formulas (Barrett, Olsen, Haigis, Haigis WK, Holladay 1, Holladay 1 WK, Holladay 2, SRK/T, SRK/T WK, Hoffer Q, and Hoffer Q WK), which have been validated extensively with real-world datasets. For ML-based formulas, the algorithm is oftentimes a black box, of which the exact behavior is not known a priori. When evaluating or developing novel ML-based IOL formulas, it is important that the evaluation metric is appropriately selected and robust enough so that the trained model can be generalized to unseen data. In addition, there is evidence from the study of Gatinel et al.
5 indicating that the lens constant value that serves to cancel the systematic bias is likely to unpredictably vary the SD.