To access the accuracies of the trained CNN models to segment retinal layers, the area segmentation obtained using the models was compared with human graders. For each of four area classes (ILM-dINL, dINL-EZ, EZ-pRPE, and pRPE-BM), the segmentation by the human graders was used as a mask applied to the model classification. The number of pixels labeled as the target class by the model in the mask area was calculated and then divided by the total number of pixels of the mask to obtain the accuracy for that class. This analysis was carried out for the central 6 mm as well as the full-scan width, and the percent accuracy results are listed in
Table 2.
When compared to human graders, the mean ± SD accuracy to identify pixels of ILM-dINL (inner retina), dINL-EZ, EZ-pRPE (OS), and pRPE-BM (RPE thickness) within the central 6 mm of B-scans was 96.0% ± 4.0%, 93.5% ± 5.4%, 85.9% ± 13.6%, and 87.7% ± 4.5%, respectively, for U-Net; 94.8% ± 7.1%, 93.3% ± 6.8%, 88.5% ± 9.0%, and 86.2% ± 6.4%, respectively, for the SW model; and 97.0% ± 1.1%, 94.1% ± 5.3%, 87.0% ± 10.3%, and 87.9% ± 4.5% for the hybrid model, respectively. The average accuracy of U-Net was comparable to that of the SW model (90.8% ± 4.8% vs. 90.7% ± 4.0%, respectively). The average accuracy of the hybrid model was 91.5% ± 4.8%, improved by 0.7% when compared to U-Net only, and a paired t-test conducted using Statistica (StatSoft, Inc., Tulsa, OK, USA) to compare two sets of accuracies suggested that this improvement was significantly different from zero (P < 0.039, t = 3.525). The average accuracy difference between U-Net and the SW model was not significant. The accuracy decreased slightly (about 1% on average) when the segmentation extended to the full B-scan width (P < 0.036, t > 3.660). The mean accuracy for full B-scan width (9 mm) was 90.0% ± 4.6%, 89.0% ± 4.2%, and 90.6 ± 4.8% for U-Net, the SW model, and the hybrid model, respectively.