Given these limitations in the interpretation of OCT, deep learning models may provide alternative ways to quantify structural damage without reliance on previously defined features derived from the automated segmentation software. As noted before, deep learning algorithms can learn features from data automatically, as long as enough data are given to them. Therefore these models can make use of raw SDOCT images without requiring the input of pre-defined features. Along those lines, Mariottoni et al.
65 recently demonstrated that a segmentation-free deep learning algorithm could be trained to predict RNFL thickness when assessing a raw OCT B-scan. The segmentation-free predictions were highly correlated with the conventional RNFL thickness (
r = 0.983,
P < 0.001), with mean absolute error of approximately 2 µm in good-quality images. Most importantly, in images where the conventional segmentation failed, the deep learning model still extracted reliable RNFL thickness information. In a more general approach, Thompson et al.
25 showed that a deep learning algorithm could be trained using the raw SDOCT B-scan to directly discriminate glaucomatous from healthy eyes. The proposed algorithm achieved a better diagnostic performance than the conventional RNFL thickness parameters from the instrument's printout, with area under the ROC curve of 0.96 vs. 0.87 for the global peripapillary RNFL thickness (
P < 0.001). Another study by Maetschke et al.
66 similarly developed a deep learning algorithm that could distinguish between glaucomatous and healthy eyes using raw, unsegmented OCT volumes of the optic nerve head. The algorithm also performed better than conventional SDOCT parameters, with an area under the ROC curve of 0.94 versus 0.89 for a logistic regression model combining SDOCT parameters. As illustrated in
Figure 4C, the class activation maps (heatmaps) appeared to highlight regions in the OCT volume that have been clinically well established as important to glaucoma diagnosis, particularly the neuroretinal rim, optic disc cupping, and the lamina cribrosa and its surrounding area. Heatmaps can help us better understand a CNN by highlighting the most relevant pixels in the image used for the predictions. Highlighted regions can thus be subjected to more detailed analysis. It should be noted, however, that class activation maps usually do not have enough resolution to be able to precisely pinpoint small areas that were relevant for the classification. This lack of precision occurs because of the way deep learning models with convolutional layers are built, leading to a down-sampling of the final layers from which the maps are created. Also, the efficiency of a heatmap largely depends on the model used and the amount and quality of available training data. As such, one can see from the heatmaps shown in
Figure 4C that they highlight very broad areas, which sometimes seem to include even the vitreous as relevant to the discrimination of glaucoma from normal. Although the deep learning algorithm may indeed be capturing information that is not yet clear to human eyes, the resolution limitations of these heatmaps need to be kept in mind.