September 2024
Volume 13, Issue 9
Open Access
Artificial Intelligence  |   September 2024
A Meta-Learning Approach for Classifying Multimodal Retinal Images of Retinal Vein Occlusion With Limited Data
Author Affiliations & Notes
  • Danba Jiachu
    Kham Eye Centre, Kandze Prefecture People's Hospital, Kangding, China
  • Li Luo
    Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou University Medical College, Shantou, Guangdong, China
  • Meng Xie
    Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
  • Xiaoling Xie
    Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou University Medical College, Shantou, Guangdong, China
  • Jinming Guo
    Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou University Medical College, Shantou, Guangdong, China
  • Hehua Ye
    Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
  • Kebo Cai
    Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
  • Lingling Zhou
    Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
  • Gang Song
    Kham Eye Centre, Kandze Prefecture People's Hospital, Kangding, China
  • Feng Jiang
    Kham Eye Centre, Kandze Prefecture People's Hospital, Kangding, China
  • Danqing Huang
    Discipline Inspection & Supervision Office, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
  • Mingzhi Zhang
    Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou University Medical College, Shantou, Guangdong, China
  • Ce Zheng
    Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
  • Correspondence: Ce Zheng, Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai 200082, China. e-mail: zhengce@xinhuamed.com.cn 
  • Mingzhi Zhang, Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou University Medical College, Shantou, Guangdong 515041, China. e-mail: zmz0754@126.com 
  • Danqing Huang, Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai 200082, China. e-mail: hdq820831@163.com 
  • Footnotes
     DJ, LL, and MX contributed equally to this work.
Translational Vision Science & Technology September 2024, Vol.13, 22. doi:https://doi.org/10.1167/tvst.13.9.22
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Danba Jiachu, Li Luo, Meng Xie, Xiaoling Xie, Jinming Guo, Hehua Ye, Kebo Cai, Lingling Zhou, Gang Song, Feng Jiang, Danqing Huang, Mingzhi Zhang, Ce Zheng; A Meta-Learning Approach for Classifying Multimodal Retinal Images of Retinal Vein Occlusion With Limited Data. Trans. Vis. Sci. Tech. 2024;13(9):22. https://doi.org/10.1167/tvst.13.9.22.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To propose and validate a meta-learning approach for detecting retinal vein occlusion (RVO) from multimodal images with only a few samples.

Methods: In this cross-sectional study, we formulate the problem as meta-learning. The meta-training dataset consists of 1254 color fundus (CF) images from 39 different fundus diseases. Two meta-testing datasets include a public domain dataset and an independent dataset from Kandze Prefecture People's Hospital. The proposed meta-learning models comprise two modules: the feature extraction networks and the prototypical networks (PNs). We use two deep learning models (the ResNet and the Contrastive Language-Image Pre-Training networks [CLIP]) for feature extraction. We evaluate the performance of the algorithms using accuracy, area under the receiver operating characteristic curve (AUCROC), F1-score, and recall.

Results: CLIP-based PNs outperform across all meta-testing datasets. For the public APTOS dataset, meta-learning algorithms achieve good results with an accuracy of 86.06% and an AUCROC of 0.87 with only 16 training images. In the hospital datasets, meta-learning algorithms show excellent diagnostic capability for detecting RVO with a very low number of shots (AUCROC above 0.99 for n = 4, 8, and 16, respectively). Notably, even though the meta-training dataset does not include fluorescein angiography (FA) images, meta-learning algorithms also have excellent diagnostic capability for detecting RVO from images with a different modality (AUCROC above 0.93 for n = 4, 8, and 16, respectively).

Conclusions: The proposed meta-learning models excel in detecting RVO, not only on CF images but also on FA images from a different imaging modality.

Translational Relevance: The proposed meta-learning models could be useful in automatically detecting RVO on CF and FA images.

Introduction
Retinal vein occlusion (RVO) is one of the most common retinal vascular diseases.1 It occurs when venous blood flow is partially (branch RVO) or completely (central RVO) blocked, leading to macular edema and vitreous hemorrhage that can significantly reduce central visual acuity.2 RVOs are estimated to be one of the most common causes of visual impairment, second only to diabetic retinopathy (DR).3 Recent epidemiological studies reported the prevalence of RVO ranges from 0.26% to 3.39%,46 and it is associated with older age and other systemic diseases. Our previous study in a Tibetan population also shows that RVO is a common cause of blindness after cataract and age-related macular degeneration (AMD).7 Conventionally, diagnosis of RVO is based on fundus examination, which can also be aided by color fundus (CF) photographs or fluorescein angiography (FA). However, early diagnosis using these multimodality medical images requires the availability of experienced ophthalmologists. Furthermore, the shortage of experienced ophthalmologists may hinder the early detection of RVO, especially in underdeveloped countries and remote regions. 
Given the growing public health concern, there is an increasing interest in establishing reliable and cost-effective methods for screening RVO. Recently, artificial intelligence (AI), especially the deep learning (DL) approach, has been reported to attain a high level of accuracy in the automated detection of numerous diseases from clinical images, such as DR,8 AMD,9 and glaucoma.10 Chen et al.11 reported that DL models show good performance in recognizing RVO and identifying lesions, such as hemorrhage, cotton-wool spots, and hard exudate. However, there are several important challenges that need to be addressed when using DL for automated detection of RVO. First, traditional supervised DL methods require that the training set and testing set are from the same domain. For example, the DL model trained on fundus images with RVO may fail to generalize good performance when tested on FA data with the same RVO patients because of different data distribution. Second, the state-of-the-art DL models require a large number of high-quality and diverse datasets. However, only a few images are available to train DL systems in some rare diseases. 
The use of meta-learning holds great promise to address the problem of data scarcity and domain generalization.12 Meta-learning enables learning model weights by leveraging prior knowledge from various tasks and can be implemented in different task objectives such as few-shot learning13 or multi-task learning. Inspired by this “learning to learn” idea, the purpose of this study is to build a meta-learning framework based on neural network for detecting RVO from multimodality images (fundus images and FA) with only limited datasets. 
Methods
Datasets
In this study, we formulate the problem as meta-learning,14 where a learner can improve the learning algorithm itself by using the experience of multiple learning episodes. The goal of meta-learning is to train a model on a diverse set of tasks,15 such that it can solve new tasks with only a few training samples.13 We trained our meta-learning model using episodic training with N-way-K-Shot classification as suggested by Ravi et al.16 As shown in Figure 1, meta-learning models are trained with a meta-training dataset and tested with a meta-testing dataset. Within meta-training dataset, there are a number of support sets and query sets to form an episode. For each meta-learning dataset, N stands for the number of classes, and K for the number of samples from each class to train on. 
Figure 1.
 
The flowchart of meta-learning model construction. This figure illustrates the standard protocol for constructing meta-learning episodes, which involves selecting N distinct classes and K query points per class to emulate the expected test-time scenario. Following the findings of Snell et al.,26 our experiments indicate that employing a greater number of ways and shots during meta-training than at test time yields superior performance. (A) Example of meta-training episode construction with 39 ways and two shots. Each way represents a unique class, and each shot corresponds to an instance within that class. (B) Evaluation phase using shots of two, four, eight, and sixteen, randomly selected from the meta-testing dataset. During this phase, the model's generalization ability is assessed across varying numbers of support examples per class.
Figure 1.
 
The flowchart of meta-learning model construction. This figure illustrates the standard protocol for constructing meta-learning episodes, which involves selecting N distinct classes and K query points per class to emulate the expected test-time scenario. Following the findings of Snell et al.,26 our experiments indicate that employing a greater number of ways and shots during meta-training than at test time yields superior performance. (A) Example of meta-training episode construction with 39 ways and two shots. Each way represents a unique class, and each shot corresponds to an instance within that class. (B) Evaluation phase using shots of two, four, eight, and sixteen, randomly selected from the meta-testing dataset. During this phase, the model's generalization ability is assessed across varying numbers of support examples per class.
We used a public dataset from Kaggle with 1000 fundus images as the meta-training dataset. We had described the details of this study and dataset previously.17 Briefly, the raw dataset included 1000 fundus images belonging to 39 classes with different fundus diseases (range 8–103 images per class). Because some classes in the raw dataset had only a limited sample size, we further collected extra images from the same study to ensure at least 24 images per class (aiming for at least 16 shots). Two meta-testing datasets were used to test the meta-learning model in this study. The first meta-testing dataset was the public domain APTOS dataset including 3662 fundus images with different stage of DR.18 We evaluated the diagnostic performance of the meta-learning model in automated binary classification of DR, involving non-referable (stages 0 and 1) versus referable (stages 2, 3, and 4) classes. We also retrospectively collected an independent meta-testing dataset from Kham Eye Centre, Kandze Prefecture People's Hospital between September 2018 and April 2022. This second meta-testing dataset involved multimodality retinal images as follows: fundus images and FA images from eyes with RVO and non-RVO controls. All subjects underwent a full ophthalmic examination, including best-corrected visual acuity, refraction, slit-lamp examination, intraocular pressure, and fundus examination by a fellowship-trained retinal specialist. The fellow eyes of RVO eyes were considered as non-RVO control if they fulfill the following criteria: best-corrected visual acuity ≥ 20/40; a presenting intraocular pressure < 21 mm Hg on non-contact tonometry; no previous history of trauma or surgery; no intraocular tumor and laser therapy. The exclusion criteria were dense cataract, other ocular pathology such as glaucoma, AMD, and DR. In accordance with current RVO guidelines, patients diagnosed with RVO were scheduled for monthly follow-up visits during the first six months after diagnosis at Kham Eye Centre, where retinal imaging was performed two to four times per subject, depending on disease severity.19 This frequent monitoring allowed us to collect a series of longitudinal images that reflect the temporal progression of RVO. For the purposes of testing our meta-learning algorithm, we aimed to incorporate as many images from different follow-up visits as possible. We used a Zeiss VISUCAM (Carl Zeiss Meditec, Jena, Germany) to capture both CF and FA images. All fundus images were of the posterior pole and obtained at 45°. For FA images, we only selected images with late phase. 
The study protocol was approved by both the ethics committee and the institutional review board (IRB) at Kandze Prefecture People's Hospital (GZZYY-2020-21). The study was conducted in accordance with the tenets of the Declaration of Helsinki, and informed consent was taken from all patients. 
Development of the Meta-Learning Algorithms
The proposed meta-learning models consist of two modules: the feature extraction networks and the prototypical networks (PNs). For feature extraction, we used and tested three neural network architectures: (a) the ResNet20 model, pretrained on the ImageNet21 dataset (with nearly 10 million images), (b) the Contrastive Language-Image Pretraining (CLIP) model based on the architecture equivalent to ResNet-50, and (c) the similar CLIP model based on the Vision Transformer architecture (CLIP-Vit-L14),22 pretrained on the LAION-2B23 (with nearly 2 billion image-text pairs).The details of ResNet and CLIP24 models were described previously. Briefly, the ResNet is an enhanced DL algorithm based on convolutional neural networks (CNNs). The deep CNNs often suffer from low diagnostic performance due to the vanishing gradient problem, which hampers the transmission of information from shallow layers to deep layers. In contrast, ResNet avoids these issues by using residual blocks with skip connections to resolve the difficulties of training deeper networks, allowing for higher accuracy. Unlike CNNs based model, CLIP is a neural network trained on a variety of data pairs consisting of both image and text. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the capabilities of few-shot learning.25 Learning directly from raw text about images, the CLIP model transfers non-trivially to most tasks and is often competitive with a fully supervised DL model without the need for any dataset specific training. 
In the subsequent phase of our study, we channeled the previously extracted features into PNs, an advanced machine learning technique designed for efficient classification with sparse data. PNs rest on the premise that each class can be encapsulated by a “prototype” within a high-dimensional space, around which similar instances coalesce—akin to identifying a clinical prototype in medical diagnostics. The architecture of PNs hinges on generating a prototypical representation for each class and classifying query points based on the distance between the class prototype and the query point using Euclidean distance measurements.26 By minimizing the distance between each query instance and the prototype of its class, the model seeks to enhance predictive accuracy. The loss function used by PNs is elegantly simple: it uses a softmax function applied over the computed distances to all class prototypes, effectively assigning higher probabilities to closer prototypes. This approach ensures that the network's predictions align closely with actual class labels. Because of its straightforward yet effective nature, PNs are one of the most popular approaches in the meta-learning literature. 
We implemented the meta-learning networks using Pytorch framework (version 1.13.1, https://pytorch.org/). The meta-learning networks took input images downsampled to 224 × 224 pixels and data augmentation included random shifts, rotations and jitters. We used 1024-dimensional features extracted by applying ResNet50 or CLIP-Vit-L14. All of our models were trained via stochastic gradient descent with AdamW.27 We used an initial learning rate of 10−4 and weight decay of 10−5
Construction of Meta-Training and Meta-Testing Episodes
The standard protocol for assembling meta-learning episodes, exemplified by Vinyals et al.28 and Ravi and Larochelle,29 entails the strategic selection of N distinct classes along with K query points per class to simulate the anticipated scenario at test time. For example, in anticipation of a five-way classification task with one-shot learning at test time, training episodes are meticulously designed with N set to five classes and K to one query point per class. In line with Snell et al.,26 we have observed significant benefits from training with a higher number of W (ways) and N (query points) than those used during testing. In adherence to the guidance of Snell et al.,26 our study adopted a random sampling approach for class subsets within each episode, covering a spectrum from five-way to 39-way classifications, each with two, four, eight, and 16 shots. Our results will highlight exclusively the most efficacious combination of ways and shots. Additionally, we implemented linear probing as our baseline for performance comparison, assessing its efficacy against meta-testing outcomes using shots of two, four, eight, and 16 randomly chosen from our test datasets. All meta-training and meta-testing processes were done using RTX A4000 GPU × 2 (CUDA version 11.6; Nvidia, Santa Clara, CA, USA) with an Intel (Santa Clara, CA, USA) Core i7-2700K processor, 4.6-GHz central processing unit, and 128 GB RAM. 
Statistical Analysis
We computed performance metrics for our models averaged over 1000 randomly generated episodes from three meta-testing datasets. The performance and 95% confidence intervals of all algorithms were evaluated using accuracy, receiver operating characteristic curve, area under the receiver operating characteristic curve (AUCROC), precision recall curve, and an F1 score. All statistical tests were performed using the torcheval package (PyTorch, version 1.13.1). The formulas for calculating the accuracy, recall, and F1-score were defined as  
\begin{eqnarray*} {\rm{Accuracy}} = \frac{{{\rm{True\ Positive}} + {\rm{True\ Negative}}}}{{{\rm{All}}}} \end{eqnarray*}
(1)
 
\begin{eqnarray*} {\rm{Recall}} = \frac{{{\rm{True\ Positive}}}}{{{\rm{True\ Positive}} + {\rm{False\ Negative}}}} \end{eqnarray*}
(2)
 
\begin{eqnarray*} {\rm{F}}1 - {\rm{score}} = \frac{{2\ast {\rm{Precision\ast Recall}}}}{{{\rm{Precision}} + {\rm{Recall}}}} \end{eqnarray*}
(3)
 
Results
We enrolled 153 RVO subjects with 579 CF and corresponding FA images in the study. We excluded 71 CF (12.3%) and 64 FA (11.1%) images because of either inadequate image-clarity or inadequate field definition. The final number of subjects was 142 subjects (92.8% retention rate), including 142 eyes with confirmed RVO and 113 non-RVO control eyes. The final CF image count was 508, comprising 269 images from RVO-affected eyes and 239 images from non-RVO control eyes. Similarly, the FA dataset included a total of 515 high-quality images, with 342 images representing RVO cases and 173 images serving as non-RVO controls. For the public APTOS dataset, we included 1487 CF images with referable DR and 2,175 CF images with nonreferable DR as independent meta-testing dataset. 
Tables 1 through 4 present a comprehensive analysis of the accuracy, AUCROC, F1 score, and recall metrics for our proposed meta-learning algorithms in the context of retinal disorder diagnosis. These results are derived from various datasets under different meta-testing scenarios involving two, four, eight, and 16 shots, alongside the corresponding baseline measurements for comparison. Our exhaustive testing with diverse combinations of ways (W) and shots (N) has culminated in the identification of an optimal configuration: 39 ways paired with eight shots consistently produced superior performance across our test datasets. Although we aspired to explore episodes featuring 39 ways with 16 shots, this was precluded by hardware constraints. Consequently, our report delineates the outcomes specifically from 39-way classifications with an eight-shot setup. The data indicate a trend of improving performance metrics as the number of meta-testing shots increases from two to 16. Remarkably, our proposed meta-learning algorithms outperformed the baseline across all evaluated meta-testing scenarios. In particular, when comparing the architectural paradigms of transformer-based, vision-language-based and CNNs-based PNs, CLIP-Vit-L14-based architecture demonstrated superior performance metrics across all three meta-testing datasets. 
Table 1.
 
Accuracy of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 1.
 
Accuracy of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 2.
 
AUCROC of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 2.
 
AUCROC of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 3.
 
F1_Score of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 3.
 
F1_Score of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 4.
 
Recall of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 4.
 
Recall of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
For the public APTOS dataset, meta-learning algorithms still achieved good results with accuracy of 86.06% (85.93% to 86.19%), AUCROC of 0.87 (0.86 to 0.88), F1_score of 0.87 (0.86 to 0.88), and recall of 0.87 (0.86 to 0.88), even when only small training images (16 shots) were available. When meta-testing in hospital datasets, meta-learning algorithms showed excellent diagnostic capability for detecting RVO with a very low number of shots (AUCROC of 0.99 [0.99–0.99], 1.00 [0.99–1.00], and 1.00 [1.00–1.00] for n = 4, 8, and 16, respectively). More specifically, even though we did not include FA images in our meta-training datasets, our results suggest that meta-learning algorithms also have excellent diagnostic capability for detecting RVO from images with different modality (AUCROC of 0.93 [0.93–0.94], 0.95 [0.94–0.95], and 0.95 [0.95–0.96] for n = 4, 8, and 16, respectively). 
We compared the performance of our CLIP-based meta-learning approach to that of a fully supervised linear probe as a baseline. As illustrated in Figure 2, the 16-shot CLIP-based meta-learning consistently outperformed the linear probe baseline across all three datasets. Notably, even with fewer shots (e.g., two or four), our approach demonstrated competitive performance, highlighting its efficiency in leveraging the pretrained knowledge from the CLIP model for few-shot classification tasks. 
Figure 2.
 
Few-shot. CLIP-based meta-learning is competitive with a fully supervised linear-probing baseline.
Figure 2.
 
Few-shot. CLIP-based meta-learning is competitive with a fully supervised linear-probing baseline.
Discussion
Previous studies have shown that DL models can achieve comparable or superior performance to human experts when large amounts of annotated data are available. However, the performance of DL is often compromised by limitations in data. To address this issue, we proposed meta-learning algorithms in the setting of small size training data (n ≤ 16). Our results suggested that the performance of meta-learning was good to excellent when tested in a publicly available fundus image database and hospital-based datasets. Moreover, the meta-learning model also showed excellent diagnostic capability when tested in hospital-based images with different imaging modality. The use of meta-learning holds great promise to reduce the amount of biomedical data needed to train predictive models in the target domain of interest. 
Supervised DL models often require large amounts of labeled data to achieve good performance, which is challenging and costly in medical image analysis. However, the performance of widely used supervised DL degrades substantially when used with limited data sets. Gulshan et al.8 reported that smaller training data sets related to the lower performance of the DL models, and it needed around 60,000 images to train DLS to reach plateau. Some researchers suggested sharing data from different centers to increase the number of training data for DL training.30 Increasing the number of data elements does not necessarily enhance the performance of a network. Also, sharing biomedical image data across different centers or countries may raise privacy and ethical issues.31 The National Institute of Standard and Technology (US) has defined biomedical image data as a kind of personally identifiable information, which could possibly preclude sharing medical images from different centers/ countries or require approval from local IRB. Moreover, accurately grading medical images requires expert knowledge of the clinicians, which can introduce interobserver or intraobserver variability.32 
Meta-learning holds great promise for reducing the amount of biomedical data needed to train DL algorithms effectively.33 In real clinical settings, obtaining sufficient data for rare conditions, such as retinoblastoma or familial exudative vitreoretinopathy, can be challenging.34 Similarly, diseases with specific complications, like macular edema with RVO, may have limited available datasets.35 The advent of novel imaging technologies, such as swept-source OCT and ultrawide field FA, often leads to a scarcity of data in the target modality. However, there may be an abundance of source data from different imaging modalities or data generated from older yet related technologies. Meta-learning addresses this challenge by utilizing these extensive source datasets across multiclass or multimodality tasks to pretrain a model that can rapidly adapt to new tasks with minimal target domain data.36 
Some methods based on the meta-learning paradigm have been explored and applied to the classification of genomic analysis and medical images. Qiu et al.37 showed that meta-learning required one order of magnitude less gene expression profiles to train an optimal model predictive of clinical outcome. In ophthalmology, Burlina et al.38 demonstrated that low-shot DL algorithm could achieve good performance for automated DR diagnostics using small data sets (best accuracy of 76.39 with 5120 training samples). In the current study, meta-learning algorithms achieve better results (best accuracy of 86.06 with 16 meta-training samples). Although direct comparison with prior studies is challenging because of variations in study design and datasets used, the findings presented herein underscore the potential of meta-learning techniques. Our results suggest that these methods can yield comparable diagnostic performance with significantly reduced data requirements, which is particularly beneficial in scenarios where data paucity impedes traditional DL approaches. 
This study has several limitations. First, we only used a meta-testing dataset with RVO from a single center. Our previous epidemiological research identified DR, AMD, glaucoma, and other posterior segment diseases as the primary causes of blindness and visual impairment. We have only disclosed preliminary findings from the public APTOS dataset. The procurement of adequate FA images from truly normal eyes without pathology is indeed challenging; hence, we have used the fellow eyes of RVO patients as a pragmatic alternative for the control group. Subsequent studies should aim to evaluate the robustness of meta-learning algorithms across data from multiple centers and a broader spectrum of ocular diseases. Second, other past studies using EyePACS (e.g., Gulshan et al.8) show higher accuracy compared to our meta-learning algorithms. The different performance may be due to differing experimental settings (such as data partition), possible DR ground truth label noise (extra annotations by new clinicians in other studies may have been used). Third, as we mentioned previously, with the advance of imaging technique, several studies suggested that the patient with RVO should be imaged with OCT angiography in addition to FA. We collected the data from remote area and OCT angiography is not available. On the other hand, our meta-learning model can be deployed using laptops as a standalone system for large-scale screening even in remote areas, which is meaningful for real-world adoption. Fourth, the compelling performance of the vision-language model, such as CLIP-ResNet50, underscores its potential for enhancing AI applications in real clinical settings—a possibility that warrants further exploration.39 Further study is need to investigate whether the performance gain was brought from architecture difference or language actually empowered the model. 
In conclusion, this research introduces a novel meta-learning approach for classifying multi-modal retinal images of RVO with limited data, addressing the significant challenge of data scarcity in medical image analysis. Our findings demonstrate that meta-learning models can achieve excellent diagnostic performance, even when trained on minimal datasets. The results highlight the potential of meta-learning to revolutionize AI applications in ophthalmology, particularly in screening and diagnosing diseases with limited data availability, thereby facilitating more accessible and accurate healthcare solutions, especially in resource-constrained settings. 
Acknowledgments
Supported by the Scientific Research Fund of Sichuan Science and Technology Department (2020YFS0537), Shantou Medical Science and Technology Planning Project (grant no. 220520096490385, 200630165260721), the National Natural Science Foundation of China (81371010), Hospital Funded Clinical Research, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (21XJMR02) and Hospital Management Research Program of Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University (HDSI-2022-A-001). 
Disclosure: D. Jiachu, None; L. Luo, None; M. Xie, None; X. Xie, None; J. Guo, None; H. Ye, None; K. Cai, None; L. Zhou, None; G. Song, None; F. Jiang, None; D. Huang, None; M. Zhang, None; C. Zheng, None 
References
Rogers S, McIntosh RL, Cheung N, et al. The prevalence of retinal vein occlusion: pooled data from population studies from the United States, Europe, Asia, and Australia. Ophthalmology. 2010; 117(2): 313–319.e311. [CrossRef] [PubMed]
McIntosh RL, Rogers SL, Lim L, et al. Natural history of central retinal vein occlusion: an evidence-based systematic review. Ophthalmology. 2010; 117(6): 1113–1123.e1115. [CrossRef] [PubMed]
Song P, Xu Y, Zha M, et al. Global epidemiology of retinal vein occlusion: a systematic review and meta-analysis of prevalence, incidence, and risk factors. J Glob Health. 2019; 9(1): 010427. [CrossRef] [PubMed]
Hayreh SS, Zimmerman B, McCarthy MJ, et al. Systemic diseases associated with various types of retinal vein occlusion. Am J Ophthalmol. 2001; 131: 61–77. [CrossRef] [PubMed]
Lim LL, Cheung N, Wang JJ, et al. Prevalence and risk factors of retinal vein occlusion in an Asian population. Br J Ophthalmol. 2008; 92: 1316–1319. [CrossRef] [PubMed]
Koh V, Cheung CY, Li X, et al. Retinal vein occlusion in a multi-ethnic Asian population: the Singapore Epidemiology of Eye Disease Study. Ophthalmic Epidemiol. 2016; 23: 6–13. [CrossRef] [PubMed]
Jiachu D, Jiang F, Luo L, et al. Blindness and eye disease in a Tibetan region of China: findings from a Rapid Assessment of Avoidable Blindness survey. BMJ Open Ophthalmol. 2018; 3(1): e000209. [CrossRef] [PubMed]
Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316: 2402–2410. [CrossRef] [PubMed]
Burlina PM, Joshi N, Pekala M, et al. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017; 135: 1170–1176. [CrossRef] [PubMed]
Li Z, Guo C, Lin D, et al. Deep learning for automated glaucomatous optic neuropathy detection from ultra-widefield fundus images. Br J Ophthalmol. 2021; 105: 1548–1554. [CrossRef] [PubMed]
Chen Q, Yu WH, Lin S, et al. Artificial intelligence can assist with diagnosing retinal vein occlusion. Int J Ophthalmol. 2021; 14: 1895–1902. [CrossRef] [PubMed]
Gevaert O. Meta-learning reduces the amount of data needed to build AI models in oncology. Br J Cancer. 2021; 125: 309–310. [CrossRef] [PubMed]
Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell. 2006; 28: 594–611. [CrossRef] [PubMed]
Hospedales T, Antoniou A, Micaelli P, et al. Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell. 2021; 44: 5149–5169.
Huisman M, van Rijn JN, Plaat A. A survey of deep meta-learning. Artif Intell Rev. 2021; 54: 4483–4541. [CrossRef]
Ravi. S, Larochelle. H. Optimization as a model for few-shot learning. In: International Conference on Learning Representations. 2016.
Cen LP, Ji J, Lin JW, et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat Commun. 2021; 12(1): 4828. [CrossRef] [PubMed]
Oulhadj M, Riffi J, Khodriss C, et al. Diabetic retinopathy prediction based on wavelet decomposition and modified capsule network. J Digit Imaging. 2023; 36: 1739–1751. [CrossRef] [PubMed]
Nicholson L, Talks SJ, Amoaku W, et al. Retinal vein occlusion (RVO) guideline: executive summary. Eye (Lond). 2022; 36: 909–912. [CrossRef] [PubMed]
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770–778.
Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248–255.
Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. 2021: 8748–8763.
Schuhmann. C, Beaumont. R, Vencu. R, et al. LAION-5B: An open large-scale dataset for training next generation image-text models. Adv Neural Inform Process Syst. 2022; 35: 25278–25294.
Xu BY, Chiang M, Chaudhary S, et al. Deep learning classifiers for automated detection of gonioscopic angle closure based on anterior segment OCT images. Am J Ophthalmol. 2019; 208: 273–280. [CrossRef] [PubMed]
Song H, Dong L, Zhang W, et al. CLIP models are few-shot learners: empirical studies on VQA and visual entailment. arXiv:2203.07190.
Snell J, Swersky K, Zemel RS. Prototypical networks for few-shot learning. Adv Neural Inform Process Syst. 2017;30.
Loshchilov I. Decoupled weight decay regularization. arXiv:1711.05101.
Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. Adv Neural Inform Process Syst. 2016: 29.
Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: International Conference on Learning Representations. 2016.
Ting DSW, Peng L, Varadarajan AV, et al. Deep learning in ophthalmology: the technical and clinical considerations. Prog Retin Eye Res. 2019; 72: 100759. [CrossRef] [PubMed]
Benke KK, Arslan J. Deep learning algorithms and the protection of data privacy. JAMA Ophthalmol. 2020; 138: 1024–1025. [CrossRef] [PubMed]
Ting DSW, Liu Y, Burlina P, et al. AI for medical imaging goes deep. Nat Med. 2018; 24: 539–540. [CrossRef] [PubMed]
Jia J, Feng X, Yu H. Few-shot classification via efficient meta-learning with hybrid optimization. Eng Appl Artif Intell. 2024; 127(107296): 107296.
Rahdar A, Ahmadi MJ, Naseripour M, et al. Semi-supervised segmentation of retinoblastoma tumors in fundus images. Sci Rep. 2023; 13(1): 13010. [CrossRef] [PubMed]
Zheng C, Ye H, Yang J, et al. Development and clinical validation of semi-supervised generative adversarial networks for detection of retinal disorders in optical coherence tomography images using small dataset. Asia Pac J Ophthalmol (Phila). 2022; 11: 219–226. [CrossRef] [PubMed]
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. 2017: 1126–1135.
Qiu YL, Zheng H, Devos A, et al. A meta-learning approach for genomic survival analysis. Nat Commun. 2020; 11(1): 6350. [CrossRef] [PubMed]
Burlina P, Paul W, Mathew P, et al. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020; 138: 1070–1077. [CrossRef] [PubMed]
Wang AY, Kay K, Naselaris T, et al. Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset. Nat Mach Intell. 2023; 5: 1415–1426. [CrossRef]
Figure 1.
 
The flowchart of meta-learning model construction. This figure illustrates the standard protocol for constructing meta-learning episodes, which involves selecting N distinct classes and K query points per class to emulate the expected test-time scenario. Following the findings of Snell et al.,26 our experiments indicate that employing a greater number of ways and shots during meta-training than at test time yields superior performance. (A) Example of meta-training episode construction with 39 ways and two shots. Each way represents a unique class, and each shot corresponds to an instance within that class. (B) Evaluation phase using shots of two, four, eight, and sixteen, randomly selected from the meta-testing dataset. During this phase, the model's generalization ability is assessed across varying numbers of support examples per class.
Figure 1.
 
The flowchart of meta-learning model construction. This figure illustrates the standard protocol for constructing meta-learning episodes, which involves selecting N distinct classes and K query points per class to emulate the expected test-time scenario. Following the findings of Snell et al.,26 our experiments indicate that employing a greater number of ways and shots during meta-training than at test time yields superior performance. (A) Example of meta-training episode construction with 39 ways and two shots. Each way represents a unique class, and each shot corresponds to an instance within that class. (B) Evaluation phase using shots of two, four, eight, and sixteen, randomly selected from the meta-testing dataset. During this phase, the model's generalization ability is assessed across varying numbers of support examples per class.
Figure 2.
 
Few-shot. CLIP-based meta-learning is competitive with a fully supervised linear-probing baseline.
Figure 2.
 
Few-shot. CLIP-based meta-learning is competitive with a fully supervised linear-probing baseline.
Table 1.
 
Accuracy of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 1.
 
Accuracy of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 2.
 
AUCROC of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 2.
 
AUCROC of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 3.
 
F1_Score of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 3.
 
F1_Score of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 4.
 
Recall of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
Table 4.
 
Recall of Meta-Learning Algorithms for Retinal Disorders Diagnosis on Different Datasets
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×