To fully utilize the information in the B-scan and en face images, we proposed a deep learning multiview fusion network (MVFN), which is composed of three subneural networks and a decision fusion module, as shown in
Figure 1. Three subnetworks are responsible for processing images from the fast-axis B-scan, slow-axis B-scan, and en face direction. In the two subnetworks for B-scans, images are processed sequentially, outputting a 1 × 8 probability vector for each image corresponding to the probability of eight classes: normal, DR, AMD, CSC, MH, RVO, EM, and RS. The two B-scan subnetworks processed all 448 B-scan images within a 3D scan, producing two decision vectors, each of size 448 × 8. The en face subnetwork takes the superficial, deep, and avascular slab images as a three-channel input and outputs a 1 × 8 probability vector. For each volume, after processing by the three subnetworks, we obtained one 1 × 8 en face decision vector and two 448 × 8 B-scan decision vectors. The three subnetworks can be based on any backbone commonly used in the field of computer vision, such as ResNet, ConvNext, or Swin Transformer. For demonstration purpose, we employed the ResNet-50 as the backbone for the three subnetworks to serve as a baseline.
The decision fusion module is comprised of several steps. For the two B-scan decision vectors (448 × 8), we first performed a probability smoothing by applying a mean filter with kernel size of 3 to the vectors along the row (448) direction. The rationale behind the filtering is that a certain type of lesion would appear in consecutive slices, and the filtering can help to reduce interslice variance. After that, we performed pooling, which extracts the five largest probabilities among the 448 positions for the seven disease classes (DR, AMD, MH, CSC, RS, EM, and RVO) and the five smallest probabilities for the one normal class, forming a new B-scan decision vector of size 5 × 8 (7 disease classes + 1 normal class). We concatenated these two new decision vectors (fast-axis and slow-axis) with the en face decision vector (1 × 8) and obtained a (5 × 2 + 1) × 8 vector, which was finally input into a random forest model to obtain the final output vector of size 1 × 8, corresponding to the probability of different diseases. It should be noted that, in addition to random forest, other types of machine learning models can also be used.