The AGD algorithm. We first divided all the weight parameters θ in the CNN into three subsets θ
1, θ
2 and θ
3, which represent the weights for the convolutional blocks and global average pooling layer that process the OCT inputs (i.e. Conv_blocks_1 and Avg_pool_1 in
Fig. 2), the convolutional and global average pooling layers for the CFP modality (i.e. Conv_blocks_2 and Avg_pool_2 in
Fig. 2), and the final fully connected layer (i.e. FC_3 in
Fig. 2), respectively. The following briefly illustrates how the AGD algorithm works during the training of the CNN model. In each training iteration, (I) we first updated θ
1 by minimizing the binary cross-entropy loss (BCEL) between the CNN predictions corresponding to the input (
Ok, and
Ck) samples that contain interpretable OCT images and the labels associated with them (i.e. in this step, the uninterpretable images were not included while calculating the training loss); (II) then similarly, θ
2 was updated by minimizing the BCEL between the CNN predictions corresponding to the input (
Ok, and
Ck) samples with interpretable CFP images from the training inputs and the labels associated with them; and (III) finally, θ
3 was updated to minimize the BCEL between the CNN predictions given all input (
Ok, and
Ck) samples (i.e. both interpretable and uninterpretable OCT/CFP) and the associated labels. After step I and II, the convolutional filters processing the OCT and CFP modality (i.e. θ
1, and θ
2) were trained toward extracting features that can best differentiate RPN/RPP samples if the inputs were interpretable. On the other hand, if one modality (or both modalities) of the inputs was (were) uninterpretable, then the features extracted by the corresponding convolutional filters were considered uninformative, as they were not included during the training of θ
1 and θ
2. In step III, the weights of the fully connected layer θ
3 were optimized to capture if the features output from θ
1 and θ
2 implies RPN, RPP, or uninformative, as well as learn to infer the correct predictions when the features corresponding to the OCT and CFP modality carry inconsistent information (e.g. one implies RPN whereas the other implies RPP, or the other was uninformative). As a result, the CNN was trained, using the AGD algorithm, to implicitly handle the uninterpretable images contained in the dual-inputs (
Ok and
Ck) without classifying them as a third class besides RPN and RPP. The illustration of the AGD algorithm from the mathematical perspective is provided in
Appendix C (The Python code implementing this algorithm can be accessed from
https://github.com/gaoqitong/Alternate-Gradient-Descent-For-Uninterpretable-Images).