In this study, we proposed an end-to-end CNN architecture using OCTA for automated DR classification. As expected, the end-to-end CNN classifier outperformed the machine learning classifiers, which used 2298 extracted local features of each OCTA image to classify the images into six groups according to DR staging. Radiomics is a systematic approach for studying latent information in medical imaging for improved accuracy. Among them, PyRadiomics is the most widely reported radiomics tool in the literature, and it contains thousands of handcrafted formulas designed to extract the distribution or texture information from medical images.
41 Although these feature-based methods achieved lower classification performance than the CNN method, they can find out which features have a major impact on the classification through the random forest or L1 optimization. By contrast, since the CNN method operates end-to-end without explicitly extracting features, it is difficult to know which features have a major influence on classification. However, the parameters in a CNN can be updated during backpropagation from the feature extraction perspective, allowing the extraction of a larger number of features that are associated with the target outcome. Because OCTA contains more unlabeled information, a fully automated CNN algorithm can process heterogeneous images quickly for an accurate and objective DR classification, potentially alleviating the requirement for a resource-intensive manual analysis and thus guiding high-risk patients for further treatment.
Also, the activation map allowed us to identify the areas in which the network was used for decision-making. By visualizing the CAM, we may identify informative image patterns or features that are useful for DR staging. However, the interpretation of these results warrants additional scrutiny because recent studies emphasized that many popular saliency maps used to interpret CNN trained on medical imaging did not meet several key criteria for utility and robustness, highlighting the need for additional validation before clinical application.
45–47 For the alternative technique, a computer-aided diagnosis system that utilizes the complementary information from CNN-based and feature-based methods will need to be further developed. Also, qualitative analysis of the latest techniques to better obtain the activation map will be required.
45
When comparing the performance of the CNN algorithm according to the input image size, OCTA images covering a larger 6 × 6 mm
2 scanned area provided a higher performance than images covering a smaller scanned area. These results strongly support previous suggestions that wider fields of view may be more desirable for early detection and monitoring of disease progression.
48–50 Meanwhile, the ML classifier showed a better performance using 3 × 3 mm
2 OCTA images, which is a completely opposite result from that of the CNN classifier. We suspect that the cause of this discrepancy is a problem in the process of extracting handcrafted features through multiple steps. Motion artifacts and distorted weak signal regions are observed more frequently in widefield OCTA, particularly in the periphery. Moreover, large SCP vessels observed in the 6 × 6 mm
2 OCTA images may substantially contribute to the assessment of DR owing to their large diameters, which is not observed in 3 × 3 mm
2 OCTA images. Although a decreased capillary perfusion and an increased capillary dropout area have been reported to be associated with worsening DR severity, larger retinal arteriolar and venular calibers are also known to increase with the DR progression.
51–54 Because machine learning classifiers use quantitative parameters of OCTA images to classify the DR severity, the scan size can affect the results, particularly during the segmentation or feature extraction stage.
Interestingly, we also observed that the CNN algorithm for DR classification achieved poor results when using the DCP layer in comparison to other OCTA layers. Because the pathology in DR is hypothesized to preferentially involve a more vulnerable DCP, the results may appear to be contrary to common knowledge.
55 There are several potential explanations. Because the CNN algorithm is trained and tested based on FA images, which only visualize the superficial retinal vessels, it is perhaps not surprising that the CNN appears to perform better using SCP images than DCP images.
56 In addition, images of DCP layers may have been affected by projection artifacts caused by shadows from superficial blood flow projected onto deeper layers, resulting in an erroneous perception of flow. Because the deeper layers are more susceptible to projection artifacts and signal attenuation, this can potentially explain the greater variation in the interpretation of OCTA images in the DCP.
57 Similar to the results, several previous studies have also suggested that SCP continues to have a greater diagnostic value even after the DCP image quality has been improved through the removal of decorrelation tail projection artifacts.
58,59
Since the wide use of CNN methods for image classification problems, several methods for the automated classification of DR severity have been proposed.
10–21,60 Most of these methods are based on fundus photographs. Ghosh et al.
19 proposed a CNN-based method to classify fundus photography into five classes (no DR, mild NPDR, moderate NPDR, severe NPDR, and PDR) and achieved an overall accuracy of 85%. Owing to the restricted data set size compared to the extremely large fundus photography data set used in the previous fundus photography–based networks, many fewer studies have focused on the CNN algorithm applied to OCT and OCTA. However, OCT and OCTA have advantages over fundus photography in that they provide more instructive information on the structure and vasculature of the retina. Zang et al.
11 applied deep learning approaches to automated DR classification based on OCT and OCTA data and achieved an overall accuracy of 71% for the classification into four classes (no DR, mild and moderate NPDR, severe NPDR, and PDR), which is a slightly lower accuracy compared to fundus photography–based DR classifications. The authors pointed out that this is due to the relatively small data sets (approximately 1/100 of that of previous studies using fundus photography data sets) and the use of an algorithm trained with classification based on fundus photography, which is a considerably different modality from OCT/OCTA. Although multiple studies have examined various artificial intelligence-based approaches to the classification of DR, we are unaware of any algorithm trained with classification based on UWF FA. In previous studies, the grading system of DR was based on a fundus photograph examination, making it prone to oversight of subtle fundus details, leading to examiner errors. In addition, alterations of the microcirculation in the peripheral retina were not observed upon a fundus examination. A recent study revealed that 17% of retinal neovascularization lies anterior to the border of seven conventional standard fields, suggesting that UWF FA allows for a more appropriate staging of DR.
24
Although we reported a comparable performance in this study, as a notable limitation, the number of patients employed is still relatively small. However, the number of patients in this study is comparable to that in others employing OCTA,
10,11,20,21,40 considering that this technology is still not ubiquitous in ophthalmology practices. We used training and testing OCTA data from only a single center, without generalizability testing using external data sets. Further studies conducting robust prospective external validation tests are required. Also, it is necessary to compare performance for DR classification between ResNet 101 and other CNN architectures (e.g., DenseNet, EfficientNet, or Inception v3). However, this study supports an important first step in end-to-end deep learning models for DR classification using OCTA images. As a strength of this study, the ground truth for the classification of DR stages is based on the UWF FA. Although OCTA has several clinical advantages over FA, its role in the clinical decision-making process is still limited. Using the CNN algorithm, we can classify the DR severity in an automated fashion by taking advantage of both UWF FA and OCTA.
In this study, we introduced a fully automated deep CNN DR classification method using OCTA images. Although OCTA is rapidly adapted to the new modality in a clinical routine, the interpretation of OCTA data remains limited. If the proposed automated DR classification framework using OCTA can provide a similar level of diagnostic value as other modalities, the number of procedures an individual would require for an accurate diagnosis would be reduced, ultimately lowering both the clinical burden and the health care costs. This system is expected to drastically reduce, on a clinical basis, the rate of vision loss attributed to DR; improve clinical management; and create a novel diagnostic workflow for disease detection and referral. For a proper clinical application of our method, further testing and optimization of the sensitivity metrics, such as genetic factors, hemoglobin A1C, duration of diabetes, and other clinical data, may be required to ensure a minimum false-negative rate. Combining the data from various imaging modalities, such as fundus photography or FA, can reinforce the performance value and thereby further improve the accuracy. Future work should include extending the algorithm to a larger number of participants, even including images with macular edema, artifacts, or low quality, to make it more generalizable in a practical manner.