OD and OC segmentation are fundamental for fundus analysis, especially for CDR calculations during discriminating glaucoma from nonglaucoma subjects. Developing an automated system for this task is crucial. First, as briefly mentioned, manual fundus photograph labeling is highly time-consuming, with the average ophthalmologist requiring 40 seconds to annotate a single photograph. Because our algorithm could reduce this time to 2 seconds, it would be highly beneficial for accelerating processing time and analyzing large-scale datasets. Second, manual annotations are highly subjective. In fact, the segmentations carried out by the ophthalmologists were easily affected by both fundus resolution and image quality. The inter-agreement rating between the various ophthalmologists, for both OD and OC dice scores on the test set, are shown in
Figure 2. As can be seen, there was slight variation between the OD segmentation results, with inter-agreement scores ranging from 0.89 to 0.95. However, the OC segmentation task suffered a larger variability, with inter-agreement scores ranging from 0.47 to 0.85. The boundary of OD was clear and definite enough to determine in fundus photograph, which produced a high inter-agreement score between the ophthalmologists, as shown in
Figure 2A. Different from the OD, the boundary of OC was more difficult to identify, which was influenced by many factors, such as tilted disc, illumination, and low contrast, etc. These factors may result in the clinical uncertainty during different ophthalmologists and a variable OC segmentation. Moreover, OC segmentation by an ophthalmologist was a highly subjective task, which was related to individual bias and clinical experiences. This also led a low inter-agreement score (see
Fig. 2B). By contrast, the automated algorithm provided a consistent result for the same photograph with freezing the trained parameters and model. Moreover, due to limited GPU memory capabilities and parameter size constraints, input fundus photographs had to be down-sampled for training, thus removing the requirement for high-resolution photographs. Another observation is that the performances of algorithm on glaucoma cases (OD dice of 0.941, cup dice of 0.864, and CDR MAE of 0.065) was better than its on nonglaucoma cases (OD dice of 0.937, cup dice of 0.794, and CDR MAE of 0.079). One reason is that the advanced glaucoma cases with severe cupping usually present more clear interfaces between the OD and OC.