In this project, we use the largest publicly available OCT dataset that is composed of four categories.
9 The dataset contains three groups of retinal pathologies: choroidal neovascularization (CNV), diabetic macular edema (DME), and drusen and one group of healthy samples labeled as normal. A sample representation of all four classes and their representative characteristics are shown in
Figure 1.
Precise data splitting is of high importance because each individual patient is represented by multiple scans in the dataset and the OCT scans that belong to the same patient are very similar and often nearly identical. Therefore, the placement of the individual patient’s scans in different data partitions such as train, test, validation, might introduce bias to the model and produce misleading performance results. The dataset is highly imbalanced and includes in total 108,312 training images (37,206 CNV, 11,349 DME, 8,617 DRUSEN, and 51,140 NORMAL) from 4,686 patients and 1,000 testing images (250 for each category) from 633 patients. In this study, the dataset is further analyzed and scans that belong to the same patient and are placed in both the train and test partition are eliminated from the training samples. This resulted in reducing the training data from 108,312 to 104,649 with 1,000 set aside for the validation partition. The aim was to keep the testing data the same as provided by the dataset authors while ensuring that the data splits are composed of independent patients’ scans which are not included in more than one partition.
A limited dataset is constructed by applying an undersampling of the minority class technique. The limited dataset consists of 7800 samples per training category. The dataset is used to conduct experiments with fewer data samples to evaluate the performance of the model with smaller amounts of data.
A mini dataset is created and is composed of 1000 image samples per training category. The prime purpose of the mini dataset is to assess the generalization abilities of the neural networks on a very small number of samples.
Table 1 summarizes the used dataset varieties. For a fair comparison, the three datasets—original, limited, and mini—share the same testing and validation datasets.