Open Access
Artificial Intelligence  |   January 2025
Noninvasive Anemia Detection and Hemoglobin Estimation from Retinal Images Using Deep Learning: A Scalable Solution for Resource-Limited Settings
Author Affiliations & Notes
  • Rehana Khan
    School of Optometry and Vision Science, University of New South Wales, Sydney, Australia
  • Vinod Maseedupally
    School of Optometry and Vision Science, University of New South Wales, Sydney, Australia
  • Kaveri A. Thakoor
    Department of Ophthalmology, Columbia University, New York, NY, USA
  • Rajiv Raman
    School of Optometry and Vision Science, University of New South Wales, Sydney, Australia
    Shri Bhagwan Mahavir Vitreoretinal Services, Sankara Nethralaya, Chennai, India
  • Maitreyee Roy
    School of Optometry and Vision Science, University of New South Wales, Sydney, Australia
  • Correspondence: Maitreyee Roy, School of Optometry and Vision Science/Faculty of Medicine and Health, Room 3032, Level 3, North Wing, Rupert Myers Building, University of New South Wales, Sydney, New South Wales 2052, Australia. e-mail: [email protected] 
Translational Vision Science & Technology January 2025, Vol.14, 20. doi:https://doi.org/10.1167/tvst.14.1.20
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Rehana Khan, Vinod Maseedupally, Kaveri A. Thakoor, Rajiv Raman, Maitreyee Roy; Noninvasive Anemia Detection and Hemoglobin Estimation from Retinal Images Using Deep Learning: A Scalable Solution for Resource-Limited Settings. Trans. Vis. Sci. Tech. 2025;14(1):20. https://doi.org/10.1167/tvst.14.1.20.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The purpose of this study was to develop and validate a deep-learning model for noninvasive anemia detection, hemoglobin (Hb) level estimation, and identification of anemia-related retinal features using fundus images.

Methods: The dataset included 2265 participants aged 40 years and above from a population-based study in South India. The dataset included ocular and systemic clinical parameters, dilated retinal fundus images, and hematological data such as complete blood counts and Hb concentration levels. Eighty percent of the dataset was used for algorithm development and 20% for validation. A deep-convolutional neural network, utilizing VGG16, ResNet50, and InceptionV3 architectures, was trained to predict anemia and estimate Hb levels. Sensitivity, specificity, and accuracy were calculated, and receiver operating characteristic (ROC) curves were generated for comparison with clinical anemia data. GradCAM saliency maps highlighted regions linked to anemia and image processing techniques to quantify anemia-related features.

Results: For predicting anemia, the InceptionV3 model demonstrated the best performance, achieving 98% accuracy, 99% sensitivity, 97% specificity, and an area under the curve (AUC) of 0.98 (95% confidence interval [CI] = 0.97–0.99). For estimating Hb levels, the mean absolute error for the InceptionV3 model was 0.58 g/dL (95% CI = 0.57–0.59 g/dL). The model focused on the area around the optic disc and the neighboring retinal vessels, revealing that anemic subjects exhibited significantly increased vessel tortuosity and reduced vessel density (P < 0.001), with variable effects on vessel thickness.

Conclusions: The InceptionV3 model accurately predicted anemia and Hb levels, highlighting the potential of deep learning and vessel analysis for noninvasive anemia detection.

Translational Relevance: The proposed method offers the possibility to quantitatively predict hematological parameters in a noninvasive manner.

Introduction
Anemia, a condition marked by a decrease in the number of red blood cells or a reduction in their oxygen-carrying capacity, is a widespread hematological disorder affecting approximately two billion people globally.1,2 Its etiology is complex and multifaceted, encompassing factors such as malnutrition, chronic diseases, and gastrointestinal bleeding, with iron deficiency being the most prevalent cause. The impact of anemia extends far beyond hematological abnormalities, significantly affecting the individuals’ overall health, well-being, and cognitive function. Anemia, a prevalent condition in individuals with diabetes mellitus, serves as an independent risk factor for the development and severity of diabetic retinopathy (DR), complicating the management of microvascular complications by masking hyperglycemia control through falsely low HbA1c levels. In developing countries, the burden of anemia is exacerbated by limited access to advanced diagnostics, making noninvasive, scalable screening solutions essential for early detection and effective management.3,4 
Traditional screening methods for anemia primarily rely on invasive blood tests to assess hemoglobin (Hb) levels, which, despite being the gold standard, present practical challenges, especially in resource-limited settings.5 The logistical difficulties associated with sample collection, analysis, and interpretation of results pose significant barriers to effective screening.6 Alternative noninvasive methods, including subjective evaluations of pallor in the eye's conjunctiva, nail beds, tongue, and palms, as well as deep learning-based techniques using electrocardiograms or smartphone applications to analyze fingernail color, can identify severe anemia but are constrained by significant variability in sensitivity, specificity, and reliability, especially when deployed in diverse populations and uncontrolled environments.711 
Given the potential for ocular manifestations to reflect systemic health conditions like anemia, recent research has focused on the use of retinal fundus imaging for noninvasive screening. Retinal changes associated with anemia, such as hemorrhages and venous tortuosity, can provide valuable insights for detection and prediction.12 However, the low prevalence of these retinal changes in patients with anemia limits their sensitivity as standalone diagnostic features. Previous studies, including those by Mitani et al.13 and Tham et al.14 have explored the use of deep learning models for automated anemia screening based on retinal fundus images. These studies showed promising results, identifying the optic disc region as a key area of interest for anemia prediction, although the specific features to look for were not clearly defined. Zhao et al.15 developed a model using ultra-widefield (UWF) fundus imaging, but its reliance on costly UWF devices limits scalability in resource-limited settings. Moreover, the Zhao study reported proportional bias in hemoglobin predictions, indicating challenges in maintaining accuracy across varying populations. Wei et al.16 proposed a lightweight network using retinal vessel optical coherence tomography (OCT) images. However, the high cost of OCT limits their applicability in resource-constrained settings; ongoing efforts to build portable OCT and UWF devices may help broaden access to these imaging modalities. 
This study aims to develop and validate a deep learning algorithm for anemia detection and Hb estimation using a conventional 45-degree retinal fundus images, as well as to identify features associated with anemia. The objective is to establish a reliable, scalable, and noninvasive method for anemia screening and prediction using retinal imaging and artificial intelligence. By utilizing standard retinal imaging, this approach allows seamless integration of anemia screening with DR screening, maximizing the utility of a single examination. This technique aims to offer a comprehensive and sensitive solution suitable for large-scale population screening and can be effectively applied in rural or resource-limited environments. 
Materials and Methods
Study Participants
The dataset comprised 2265 patients with diabetes aged 40 years and older, recruited from a population-based cross-sectional study conducted in South India (SNDREAMS).17 The detailed methodology has been described elsewhere. In summary, a written consent was obtained from each participant for sample collection, and they underwent various health measurements and completed questionnaires. Collected data included age, race/ethnicity, sex, current smoking status, and medical and ocular history, as reported via questionnaires to social workers. General physical examinations recorded height, weight, waist and hip circumference, blood pressure, and heart rate. Comprehensive eye examinations included vision assessment, objective and subjective refraction, corneal examination, slit lamp evaluation of the anterior segment, Goldmann applanation tonometry for intraocular pressure measurements, and lens opacity assessments. For posterior eye segment evaluation, dilated 45-degree 4-field retinal fundus images were taken using a VISUCAMlite (Carl Zeiss, Jena, Germany) by 2 experienced technicians. Laboratory analyses included complete blood count and Hb concentration levels, analyzed using the Merck Micro Lab 120 semi-automated analyzer. Hb levels were measured with a calorimetric hemoglobinometer, packed cell volume with the capillary method, and glycosylated Hb fraction using the Bio-Rad DiaSTAT HbA1c Reagent Kit.17 Anemia was defined per World Health Organization guidelines as Hb levels below 12 g/dL for women and 13 g/dL for men.18 Inclusion criteria required blood tests to be conducted within 2 weeks of fundus imaging, with no transfusion therapy or blood donation between imaging and blood measurement. Fundus images that were either blurred or with artifacts due to vitreous hemorrhage or severe cataracts were excluded. For external validation, a total of 255 UWF OPTOS (Daytona; Optos PLC, Dunfermline, United Kingdom) images, along with corresponding clinical and hematological data, were collected from Sankara Nethralaya, India. Additionally, the images were cropped to extract the central area from the original UWF fundus images, generating a field of view similar to that used in our developed model. The study received approval from the Institutional Review Board at Vision Research Foundation, Sankara Nethralaya, Chennai, and the Human Research Ethics Advisory Panel of the University of New South Wales ensuring adherence to the Declaration of Helsinki principles. 
Model Development
Fundus images meeting the inclusion and exclusion criteria were utilized, with 80% (3618 images) allocated to the development dataset and the remaining 20% (899 images) reserved exclusively for the validation dataset. To prevent data leakage, the datasets were separated at the patient level, ensuring that no images from the same patient were included in both the development and validation sets. The images were preprocessed for algorithm training, including resizing to 284 × 284 pixels using bicubic interpolation and applying standard data augmentation methods (e.g. horizontal and vertical flipping, rotation up to 30 degrees) to enhance the training set and reduce overfitting. Pixel values were normalized to the range of 0 to 1, which involved scaling pixel values from their original range of 0 to 255 (for 8-bit grayscale images) to a standard range of 0 to 1. Each pixel value was divided by 255, converting the intensity values to a normalized scale. This normalization step ensured consistency across all images and improved the neural network's learning efficiency by stabilizing the training process. The network weights were optimized using a distributed stochastic gradient descent implementation.19 Images where the optic disc could not be detected were excluded from both datasets. 
The model was designed to perform two tasks: (1) binary classification: predicting the presence or absence of anemia using fundus images and demographic information, and (2) regression: estimating Hb levels from fundus images. 
A deep convolutional neural network utilizing VGG16,20 ResNet50,21 and Inception-v322 architectures was developed and trained in TensorFlow version r2.11.23 The comparative characteristics of these architectures are summarized in Supplementary Table S1. Testing these architectures helps determine which provides the best performance by using their unique strengths in feature extraction and representation. Fundus images were converted into feature maps by each architecture and then transformed into feature vectors through pooling layers. 
For the first task (classification), both gender and age were incorporated as input features into the convolutional neural network (CNN), facilitating the development of a multimodal model. Gender was represented as a 2-class 1-hot encoded vector, whereas age, a continuous variable, was encoded using a 7-bit binary representation, covering the range from 0 to 120 years. These demographic features were concatenated with the feature maps extracted from the fundus images after the convolutional layers, prior to entering the fully connected layers. This integration allowed the model to utilize both image-based features and patient metadata to improve predictive performance. For the second task (regression), the model was designed to predict continuous Hb values, which ranged between 5 g/dL and 20 g/dL. These Hb values served as the target variable, not as an input feature. To optimize the model’s training, the target Hb values were standardized as a preprocessing step. Standardization transforms the target values to have a mean of zero and a standard deviation of one, ensuring that all input features and the target variable are on the same scale, thus improving the model's convergence during training. 
The standardization formula used is:  
\begin{eqnarray*} && \,X \, standardized = X-\mu /\sigma \end{eqnarray*}
where: X is the original Hb value, μ is the mean of the Hb values in the training set, and σ is the standard deviation of those Hb values. 
Following standardization, the Hb values were used as the output labels during the model training process. The network architecture consisted of fully connected (dense) layers, with 20 nodes in the first hidden layer. The final output layer was designed to predict Hb concentrations within the target range of 5 to 20 g/dL. A linear activation function was applied in the output layer to ensure that the predicted Hb values remained continuous, without any constraints on the range, while maintaining their real-world applicability. 
The model was trained using a distributed stochastic gradient descent optimizer with a batch size of 32, for 100 epochs, an initial learning rate of 0.001 (reduced on plateau by a factor of 0.1), and a weight decay of 0.0001, with data augmentation performed on the fly during training to mitigate overfitting; automated early stopping was applied based on validation loss,24 and an ensemble of 10 networks was trained on the development set, with the final Hb prediction obtained by averaging predictions across all ensemble networks, considering both eyes for each participant. The architecture of the developed model is depicted in Figure 1
Figure 1.
 
Network architecture for anemia and Hb concentration prediction. The process starts with the input of a retinal image, followed by preprocessing. Three separate convolutional neural networks (VGG16, ResNet50, and Inception V3) extract important visual features from the image through convolutional and max-pooling layers, generating feature maps (represented as small blue boxes in a 3 × 3 grid). These blue boxes correspond to the output of the convolutional layers, where key patterns from the retinal image are detected and compressed into a more compact representation. These features are then passed through dense layers, which integrate the image data with demographic information, such as age and gender, at specific concatenation points (marked “C”). The network produces two outputs: a continuous hemoglobin value and a binary classification for anemia detection. The color-coded arrows illustrate the data flow: red arrows for main pathways, green arrows for skip connections, and orange arrows for output paths.
Figure 1.
 
Network architecture for anemia and Hb concentration prediction. The process starts with the input of a retinal image, followed by preprocessing. Three separate convolutional neural networks (VGG16, ResNet50, and Inception V3) extract important visual features from the image through convolutional and max-pooling layers, generating feature maps (represented as small blue boxes in a 3 × 3 grid). These blue boxes correspond to the output of the convolutional layers, where key patterns from the retinal image are detected and compressed into a more compact representation. These features are then passed through dense layers, which integrate the image data with demographic information, such as age and gender, at specific concatenation points (marked “C”). The network produces two outputs: a continuous hemoglobin value and a binary classification for anemia detection. The color-coded arrows illustrate the data flow: red arrows for main pathways, green arrows for skip connections, and orange arrows for output paths.
Performance Evaluation
To assess the model’s performance in binary classification, the area under the curve (AUC) and sensitivity, specificity, and accuracy were calculated to assess the ability to distinguish between anemic and non-anemic participants. For continuous Hb level estimation, the mean absolute error (MAE) was calculated along with the 95% limits of agreement using clinically measured Hb levels compared with predicted Hb levels. Bland-Altman plots were generated to visualize the bias and correlation between the clinically measured and predicted Hb levels. 
Model Explanation
To emphasize the regions in fundus images that were most influential in predicting anemia or Hb concentration, saliency maps were generated using GradCAM visual explanation tools.25 GradCAM was applied to the final convolutional layer to create heatmaps matching the resolution of the model's output. Areas that have a greater impact on predictions appear redder in these heatmaps. This method highlights the network's contributions to specific regions in the image, with colored heatmaps overlaid on the original images to visually indicate the important areas. 
Identifying the Fundus Features Associated With Anemia
Spatial parameters around the optic disc, including vessel thickness, vessel tortuosity, and vessel density, were calculated from processed fundus images. Two zones, zone A and zone B, were used to compute these parameters. Zone A represents the area between the first (1r) and second (2r) radii from the optic disc center, whereas zone B spans the region between the second (2r) and third (3r) radii. Vessel parameters were calculated using eight major vessels (4 arteries and 4 veins), as shown in Figure 2. The retinal vessel density was quantified as the percentage of retinal area occupied by vessels in the specified zones. Vessel tortuosity was calculated as the ratio of the actual vessel length to the straight-line distance between its end points. Vessel thickness was measured in millimeters (mm) using high-resolution imaging. 
Figure 2.
 
Illustration of the two zones (zone A and zone B) used for computing vessel parameters. Each zone is defined as a circular area cantered around the optic disc. Vessels are classified into arteries shown in blue, and veins shown in green.
Figure 2.
 
Illustration of the two zones (zone A and zone B) used for computing vessel parameters. Each zone is defined as a circular area cantered around the optic disc. Vessels are classified into arteries shown in blue, and veins shown in green.
The original images were first converted to 8-bit grayscale using Fiji (a free software available at https://fiji.sc).26 Contrast-limited adaptive histogram equalization (CLAHE) was applied to improve the visibility of smaller features and improve contrast in the image.27 In the CLAHE method, the image was divided into small contextual regions (tiles), and histogram equalization was performed locally within each tile. This approach enabled localized contrast enhancement, improving the visibility of fine vessel structures while preserving the overall image integrity. A clip limit of 0.01 was used to control the amplification of the histogram's slope within each tile, preventing over-enhancement and minimizing noise, whereas still improving contrast in areas with low visibility. These parameters were selected to optimize vessel segmentation and enhance the clarity of retinal features, facilitating more accurate analysis of vessel thickness, density, and tortuosity. Vessel binarization was then performed using the “Trainable Weka Segmentation”28 and LoG3D29 plugins, which enabled pixel-level segmentation of the vessels. A representative fundus image was manually annotated to train the classifier, distinguishing vessels from the background. Once the classifier was trained, it was applied to the entire image for segmentation (Fig. 3). To verify the accuracy of the segmentation, the results were compared to a manually annotated ground truth. A set of representative images was manually segmented by a single trained grader to serve as the reference standard. The accuracy of the software’s segmentation was then evaluated by comparing the automated results to this manually annotated ground truth. The accuracy was measured using standard image segmentation metrics, such as the Dice Similarity Coefficient (DSC),30 to assess how closely the automated segmentation aligned with the manually annotated vessels. The DSC provides a quantitative measure of overlap, with values ranging from 0 to 1, where 1 indicates a perfect match between the automated segmentation and the ground truth. This metric accounts for both false positives and false negatives, ensuring a balanced assessment of the software’s performance. By using the DSC, the study ensures a rigorous evaluation of the segmentation algorithm, highlighting its reliability and potential applicability in clinical practice. 
Figure 3.
 
Image processing steps for vessel segmentation and analysis. (A) Original red, green, blue (RGB) fundus image. (B) Conversion of the image to 8-bit grayscale. (C) Enhancement of local contrast using contrast-limited adaptive histogram equalization (CLAHE). (D) Vessel binarization using the “Trainable Weka Segmentation” plugin in Fiji, where the software was trained to distinguish vessels from the background by manually drawing representative lines inside (red) and outside (green) the selected vessels. (E) Binary image showing vessels as white pixels on a black background. (F) The binarized vessel image processed with the “Skeletonize” plugin to convert vessels into thin tracks. (G) Identification of vessel branches (green) and branch nodes (orange), with measurements of the actual branch lengths and straight-line distances between the nodes. (H) Calculation of local vessel thickness using the “Geometry to Distance Map” plugin, along with vessel density and tortuosity parameters.
Figure 3.
 
Image processing steps for vessel segmentation and analysis. (A) Original red, green, blue (RGB) fundus image. (B) Conversion of the image to 8-bit grayscale. (C) Enhancement of local contrast using contrast-limited adaptive histogram equalization (CLAHE). (D) Vessel binarization using the “Trainable Weka Segmentation” plugin in Fiji, where the software was trained to distinguish vessels from the background by manually drawing representative lines inside (red) and outside (green) the selected vessels. (E) Binary image showing vessels as white pixels on a black background. (F) The binarized vessel image processed with the “Skeletonize” plugin to convert vessels into thin tracks. (G) Identification of vessel branches (green) and branch nodes (orange), with measurements of the actual branch lengths and straight-line distances between the nodes. (H) Calculation of local vessel thickness using the “Geometry to Distance Map” plugin, along with vessel density and tortuosity parameters.
Directional filtering was applied using the MorphoLibJ31 plugin, which reduced the line length of segmented vessels to eight pixels. This step ensured accurate representation of vessel structures. Once validated, the classifier was used across all images for consistent vessel segmentation. 
From the binarized images, vessel density was calculated as the proportion of pixels occupied by vessels relative to the total area, quantified using the “Measure” function in Fiji. The “Skeletonize” plugin was then used to convert binary images into skeletonized images, reducing the vessels to a thin track with a 1-pixel diameter. Using the “Analyse Skeleton”32 plugin, the actual length of each branch and the straight length between branch nodes (connection points) were calculated. Vessel tortuosity was determined as the proportion of the sum of actual branch lengths to the sum of straight lengths between branch nodes. Vessel density was rechecked using the “Vessel analysis”33 plugin in Fiji. The Mexican hat filter34 plugin was used for edge detection and feature enhancement and geometry to distance map plugin to measure the local vessel thickness (see Fig. 3). An independent Samples t-test was conducted to compare vessel parameters between the anemic and non-anemic groups. 
Statistical Analysis
Statistical analysis was performed using SPSS version 21.0 (IBM Corp., Armonk, NY, USA). A nonparametric Mann-Whitney U Test was conducted to compare the median age between the development and validation datasets. A Chi-Square test was used to assess whether there was a significant association between the gender and the groups. Additionally, a post hoc analysis was performed to confirm that the study had adequate power to identify significant differences and associations. 
Results
A total of 2265 fundus images from 5830 participants were included. In the development set, the median age of participants was 52.13 years (interquartile range = 49.47 to 60.00 years), whereas in the validation set, the median age was 52.47 years (interquartile range = 50.00 to 60.30 years). Statistical analysis showed no significant difference in age between the development and validation sets (P value = 0.35). The female group had lower Hb concentrations than the male group in both datasets. Overall, anemia was present in 34.50% of the participants in the development set and 33% in the validation set. Among the participants, a higher prevalence of anemia was observed in female patients, with 23.12% in the development set and 20.3% in the validation set, compared with male patients, with 11.37% in the development set and 12.6% in the validation set (Table 1). The Chi-Square test indicated no significant difference in the prevalence of anemia between the two groups (χ² = 0.0226, P = 0.42), suggesting that anemia rates were similar across the development and validation datasets. The post hoc power analysis showed a 96% power, indicating that the study had sufficient power to detect significant effects. 
Table 1.
 
Basic Characteristics of the Development Datasets and the Validation Dataset
Table 1.
 
Basic Characteristics of the Development Datasets and the Validation Dataset
For deep learning-based classification of anemia, each architecture was trained to perform binary classification tasks to differentiate between the presence and absence of anemia. The InceptionV3 model achieved 98% accuracy, 99% sensitivity, and 97% specificity. Both the VGG16 and ResNet50 models showed 97% accuracy, 99% sensitivity, and 95% specificity for anemia prediction (Table 2). Figure 4 illustrates the receiver operating characteristic (ROC) curves for anemia prediction using the InceptionV3, ResNet50, and VGG16 architectures. The model trained on fundus images achieved an AUC of 0.98 (95% CI = 0.97–0.99) for InceptionV3, 0.97 (95% CI = 0.96–0.99) for ResNet50, and 0.96 (95% CI = 0.95–0.99) for VGG16. 
Table 2.
 
Performance Metrics for Anemia Prediction
Table 2.
 
Performance Metrics for Anemia Prediction
Figure 4.
 
ROC curves for anemia prediction.
Figure 4.
 
ROC curves for anemia prediction.
Figures 5A, 5B, and 5C display scatter diagrams, whereas Figures 5D, 5E, and 5F show Bland–Altman plots for the predicted versus measured Hb concentrations. The linear fit slopes for Hb estimation were –0.20 (95% CI = –0.19 to –0.22) for the InceptionV3 architecture, –0.24 (95% CI = –0.23 to –0.25) for the ResNet50 architecture, and –0.26 (95% CI = –0.24 to –0.27) for the VGG16 architecture. The negative slopes observed in the Bland-Altman plots indicate a proportional bias in the predictions of all models, where Hb concentrations are overestimated at lower levels and underestimated at higher levels. However, this bias does not suggest a decline in model accuracy with increasing Hb concentrations, as errors are present across both the lower and upper ranges of Hb values. The relatively low correlation coefficients further emphasize the weak relationship between predicted and actual Hb concentrations, indicating that the models’ predictive accuracy is limited, particularly at the extremes of the Hb distribution. The MAE for the Hb estimation task was 0.58 g/dL (95% CI = 0.57–0.59 g/dL) for the InceptionV3 model, 0.60 g/dL (95% CI = 0.59–0.62 g/dL) for the ResNet50 model, and 0.62 g/dL (95% CI = 0.60–0.64 g/dL) for the VGG16 model. Furthermore, the degree of bias varied among the different architectures, with the InceptionV3 model showing the least amount of bias, whereas the VGG16 model exhibited a steeper slope, suggesting a greater tendency to underestimate Hb concentrations at higher levels. For external validation, the model achieved an accuracy of 0.83 ± 0.022, sensitivity of 0.84 ± 0.018, specificity of 0.83 ± 0.012, and precision of 0.85 ± 0.018 for anemia prediction. Additionally, when the widefield images were cropped to a central 45-degree region, the model demonstrated an accuracy of 0.86 ± 0.020, sensitivity of 0.85 ± 0.002, specificity of 0.84 ± 0.011, and precision of 0.85 ± 0.010 for anemia prediction. For hemoglobin estimation, the InceptionV3 model achieved an MAE of 0.63 g/dL (95% CI = 0.61–0.69 g/dL) for widefield images, and a MAE of 0.60 g/dL (95% CI = 0.58–0.64 g/dL) for cropped images. 
Figure 5.
 
Model performance for estimating Hb concentration. Panels (A, B, C) show scatter diagrams where each circle represents the predicted versus the measured Hb value, with the black dashed line indicating the ideal model. Panels (D, E, F) present Bland–Altman plots illustrating the difference between predicted and measured Hb values against the measured values. In these plots, each dot represents the difference between predicted and measured values. The black line indicates the mean difference, the black dashed lines represent the 95% limits of agreement, and the red line shows the line of fit.
Figure 5.
 
Model performance for estimating Hb concentration. Panels (A, B, C) show scatter diagrams where each circle represents the predicted versus the measured Hb value, with the black dashed line indicating the ideal model. Panels (D, E, F) present Bland–Altman plots illustrating the difference between predicted and measured Hb values against the measured values. In these plots, each dot represents the difference between predicted and measured values. The black line indicates the mean difference, the black dashed lines represent the 95% limits of agreement, and the red line shows the line of fit.
Figure 6 illustrates the GradCAM results comparing anemic (A) and non-anemic (B) images. The saliency map shows that the model predominantly attends to the region surrounding the optic disc when making predictions related to anemia. Red and yellow areas indicate regions with the greatest impact on the model's predictions, with red areas reflecting the strongest influence, whereas blue regions contribute minimally. To assess whether the blood vessels around the optic disc influence anemia prediction, a separate quantification of vessel characteristics was conducted. Because GradCAM does not provide direct information on vessel thickness or tortuosity, these features were evaluated independently through image processing techniques. 
Figure 6.
 
GradCAM visualization comparing anemic (A) and non-anemic images (B).
Figure 6.
 
GradCAM visualization comparing anemic (A) and non-anemic images (B).
The quantification of vessel parameters surrounding the optic disc indicates that anemic subjects have significantly higher vessel tortuosity and reduced vessel density compared to non-anemic subjects in certain retinal zones. Specifically, anemic subjects exhibited significantly higher vessel tortuosity compared to non-anemic subjects in both zone A (1.191 ± 0.011 vs. 1.182 ± 0.018, P < 0.0001) and zone B (1.184 ± 0.012 vs. 1.176 ± 0.011, P < 0.0001). Vessel density was not significantly different between anemic and non-anemic subjects in zone A (0.520 ± 0.038 vs. 0.523 ± 0.042, P = 0.4618) but was significantly lower in anemic subjects in zone B (0.485 ± 0.010 vs. 0.491 ± 0.012, P < 0.0001). Vessel thickness was significantly greater in non-anemic subjects in zone A (0.473 ± 0.015 mm vs. 0.465 ± 0.011 mm, P < 0.0001), whereas no significant difference in vessel thickness was observed in zone B (0.473 ± 0.010 mm vs. 0.475 ± 0.013 mm, P = 0.0990). These results suggest that anemia is associated with increased vessel tortuosity and reduced vessel density in certain retinal zones, with mixed effects on vessel thickness (Table 3). 
Table 3.
 
Quantification of Vessel Parameters for Anemic and Non-Anemic Subjects
Table 3.
 
Quantification of Vessel Parameters for Anemic and Non-Anemic Subjects
Discussion
This study developed a deep-learning model for predicting anemia and estimating Hb concentrations from retinal fundus images, leveraging three artificial intelligence (AI) architectures with InceptionV3 demonstrating the highest performance metrics, including an AUC of 0.98 (95% CI = 0.97–0.99). The InceptionV3 model also achieved an MAE of 0.58 g/dL (95% CI = 0.57–0.59 g/dL) for Hb estimation, with a Bland-Altman plot showing strong agreement between predicted and measured Hb concentrations. The small bias and narrow limits of agreement suggest that the model performs well overall, although the observed trend at higher Hb concentrations indicates a need for further calibration to ensure accuracy across all ranges. 
Studies by Mitani et al.13 and Zhao et al.15 used deep learning techniques, with Zhao et al.15 utilizing an architecture similar to the InceptionV3 model (InceptionResNetV2), achieving AUCs of 0.90 and 0.93, respectively, for anemia detection in a general population. The model presented in this study achieved an AUC of 0.98; however, variations in performance may be influenced by factors such as dataset size, image quality, the inclusion of a diabetic cohort, and methodologies. Furthermore, without directly testing the architectures used in these prior studies, it is difficult to attribute performance differences solely to the choice of model architecture. 
For the external validation, although there was a slight reduction in performance, it is important to note that UWF images were used, rather than the conventional camera images used in the original model. To better match the field of view of the developed model, the central 45-degree region of the UWF images was cropped, leading to an improvement in performance. However, the sample size for this external dataset is relatively small, which may limit the generalizability and robustness of the results. 
In terms of saliency mapping, our results show that the model consistently focuses on spatial features around the optic disc similar to the findings of Mitani et al.,13 and Wei et al.,16 Although the saliency maps indicate the regions of the image most influential to the model's decision making, they do not directly identify the specific biological features being utilized. The optic disc and its surrounding region include two primary anatomic components: the vascular and neuronal structures.35 Because hemoglobin is a hematological parameter, changes are expected to predominantly affect the vascular component rather than the neuronal one. To explore this further, vessel characteristics, such as tortuosity and density, were quantified, revealing notable differences in subjects with anemia, including increased vessel tortuosity in both zone A and zone B. These findings align with the biological understanding that iron deficiency anemia can impact retinal microvasculature, leading to changes such as increased vessel curvature and microvascular occlusions.36 
The analysis of vessel parameters also revealed a lower vessel density in subjects with anemia in zone B, which aligns with findings from other studies suggesting that capillary loss or reduced choroidal blood flow can occur in anemia.37 The choroid plays a vital role in supplying nutrients to the retina, and its compromised blood flow may contribute to diminished retinal health in anemic individuals.38 Although changes in vessel tortuosity and density were observed, differences in vessel thickness were minimal and may not be easily detectable using conventional ophthalmoscopy or fundus photography. The integration of AI-assisted technology in clinical diagnostic devices could significantly enhance the precision and accuracy of these retinal assessments, making it easier to identify subtle variations in vessel characteristics. 
The strengths of our study include the high performance of the deep-learning model and its potential to enhance anemia screening, particularly in settings with limited medical resources. This model could provide a valuable noninvasive screening tool for large populations and rural areas, minimizing the need for invasive blood tests. Furthermore, identifying retinal features associated with anemia can assist clinicians in detecting the condition during routine eye examinations. 
However, the study has limitations. The relatively small dataset, despite a power calculation indicating sufficient sample size, limits the generalizability of the findings. The lack of detailed analysis of other blood parameters limits the scope of the model, suggesting a need for future work to develop comprehensive classification models for anemia. The use of a traditional fundus camera, whereas cost-effective, limits the ability to capture peripheral retinal features associated with anemia. Although external validation was conducted using UWF images, a larger sample size in future studies is needed to provide more comprehensive insights. However, the high cost of UWF cameras presents a challenge, particularly in resource-limited settings. Another limitation is that the effect of including demographic variables, such as age and gender, on the model's predictive performance was not directly assessed. Although these factors were incorporated into the model, a comparison between models with and without these variables was not performed. Additionally, the model’s performance across different ethnic groups was not evaluated, which could impact its generalizability. Future research should include diverse ethnic groups to ensure the model’s applicability and effectiveness across various populations. 
Conclusions
Anemia is a global public health challenge, affecting over two billion individuals, with a particularly high burden in low-resource settings. Our AI-based model, which predicts anemia and estimates Hb levels from retinal fundus images, offers a noninvasive, cost-effective alternative that can be seamlessly integrated with DR screening programs. Because DR screening already involves fundus photography in many primary healthcare settings, incorporating anemia detection into these programs would optimize existing resources, enabling simultaneous screening for two major health conditions. This dual-purpose approach enhances healthcare delivery by improving access to anemia screening without requiring additional infrastructure, thereby promoting preventive care in resource-limited environments. Such integration not only minimizes costs but also expands the reach of essential diagnostic services to underserved populations, fostering better health outcomes through early intervention. 
Acknowledgments
Disclosure: R. Khan, None; V. Maseedupally, None; K.A. Thakoor, None; R. Raman, None; M. Roy, None 
References
McLean E, Cogswell M, Egli I, Wojdyla D, de Benoist B. Worldwide prevalence of anaemia, WHO Vitamin and Mineral Nutrition Information System, 1993–2005. Public Health Nutr. 2009; 12: 444–454.
Stevens GA, Paciorek CJ, Flores-Urrutia MC, et al. National, regional, and global estimates of anaemia by severity in women and children for 2000–19: a pooled analysis of population-representative data. Lancet Glob Health. 2022; 10(5): e627–e639. [CrossRef] [PubMed]
Newhall DA, Oliver R, Lugthart S. Anaemia: a disease or symptom. Neth J Med. 2020; 78(3): 104–110. [PubMed]
Bentley ME, Griffiths PL. The burden of anemia among women in India. Eur J Clin Nutr. 2003; 57(1): 52–60. [CrossRef] [PubMed]
An R, Huang Y, Man Y, et al. Emerging point-of-care technologies for anemia detection. Lab Chip. 2021; 21(10): 1843–1865. [CrossRef] [PubMed]
Delaforce A, Duff J, Munday J, Hardy J. Preoperative anemia and iron deficiency screening, evaluation, and management: barrier identification and implementation strategy mapping. J Multidiscip Healthc. 2020; 13: 1759–1770. [CrossRef] [PubMed]
Dimauro G, Caivano D, Girardi F. A new method and a non-invasive device to estimate anemia based on digital images of the conjunctiva. IEEE Access. 2018; 6: 46968–46975. [CrossRef]
Mannino RG, Myers DR, Tyburski EA, et al. Smartphone app for non-invasive detection of anemia using only patient-sourced photos. Nat Commun. 2018; 9(1): 4924. [CrossRef] [PubMed]
Dimauro G, De Ruvo S, Di Terlizzi F, et al. Estimate of anemia with new non-invasive systems—a moment of reflection. Electronics. 2020; 9(5): 780. [CrossRef]
Mahmud S, Donmez TB, Mansour M, Kutlu M, Freeman C. Anemia detection through non-invasive analysis of lip mucosa images. Front Big Data. 2023; 6: 1241899. [CrossRef] [PubMed]
Kwon JM, Cho Y, Jeon KH, et al. A deep learning algorithm to detect anaemia with ECGs: a retrospective, multicentre study. Lancet Digit Health. 2020; 2(7): e358–e367. [CrossRef] [PubMed]
Aisen ML, Bacon BR, Goodman AM, Chester EM. Retinal abnormalities associated with anemia. Arch Ophthalmol. 1983; 101(7): 1049–1052. [CrossRef] [PubMed]
Mitani A, Huang A, Venugopalan S, et al. Detection of anaemia from retinal fundus images via deep learning. Nat Biomed Eng. 2020; 4(1): 18–27. [CrossRef] [PubMed]
Tham YC, Cheng CY, Wong TY. Detection of anaemia from retinal images. Nat Biomed Eng. 2020; 4(1): 2–3. [CrossRef] [PubMed]
Zhao X, Meng L, Su H, et al. Deep-learning-based hemoglobin concentration prediction and anemia screening using ultra-wide field fundus images. Front Cell Dev Biol. 2022; 10: 888268. [CrossRef] [PubMed]
Wei H, Shen H, Li J, Zhao R, Chen Z. AneNet: a lightweight network for the real-time anemia screening from retinal vessel optical coherence tomography images. Optics Laser Technol. 2021; 136: 106773. [CrossRef]
Agarwal S, Raman R, Paul PG, et al. Sankara Nethralaya—Diabetic retinopathy epidemiology and molecular genetic study (SN—DREAMS 1): Study design and research methodology. Ophthalmic Epidemiol. 2005; 12(2): 143–153. [CrossRef] [PubMed]
Addo OY, Emma XY, Williams AM, et al. Evaluation of hemoglobin cutoff levels to define anemia among healthy individuals. JAMA Network Open. 2021; 4(8): e2119123. [CrossRef] [PubMed]
Ketkar N. Stochastic gradient descent. Deep learning with Python: a hands-on introduction. 2017: 113–132. Available at: https://www.oreilly.com/library/view/deep-learning-with/9781484227664/A416804_1_En_8_Chapter.html.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014. Available at: https://arxiv.org/abs/1409.1556.
Koonce B, Koonce B. ResNet 50. Convolutional neural networks with swift for Tensorflow: image recognition and dataset categorization. Koonce B, ed. Berkeley, CA: Apress; 2021: 63–72. Available at: https://link.springer.com/book/10.1007/978-1-4842-6168-2.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016 (pp. 2818–2826).
Abadi M, Barham P, Chen J, et al. {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) 2016 (pp. 265–283).
Ying X. An overview of overfitting and its solutions. Journal of Physics: Conference series 2019 Feb (Vol. 1168, p. 022022). Bristol, UK: IOP Publishing. Available at: https://iopscience.iop.org/article/10.1088/1742-6596/1168/2/022022.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision 2017 (pp. 618–626). Available at: https://ieeexplore.ieee.org/document/8237336.
Goldstein JI, Newbury DE, Michael JR, et al. Scanning electron microscopy and X-ray microanalysis. Traverse City, MI: Horizon Books; 2018: 187–193.
Pizer SM, Amburn EP, Austin JD, et al. Adaptive histogram equalization and its variations. Comp Vision, Graphics, Image Proc. 1987; 39(3): 355–368. [CrossRef]
Schindelin J, Arganda-Carreras I, Frise E, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012; 9(7): 676–682. [CrossRef] [PubMed]
Sage D, Neumann FR, Hediger F, Gasser SM, Unser M. Automatic tracking of individual fluorescence particles: application to the study of chromosome dynamics. IEEE Trans Image Proc. 2005; 14(9): 1372–1383. [CrossRef]
Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945; 26(3): 297–302. [CrossRef]
Legland D, Arganda-Carreras I, Andrey P. MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics. 2016; 32(22): 3532–3534. [CrossRef] [PubMed]
Arganda-Carreras I, Fernández-González R, Muñoz-Barrutia A, Ortiz-De-Solorzano C. 3D reconstruction of histological sections: application to mammary gland tissue. Microsc Res Tech. 2010; 73(11): 1019–1029. [CrossRef] [PubMed]
Elfarnawany MH. Signal processing methods for quantitative power Doppler microvascular angiography. Western Ontario, Canada: The University of Western Ontario (Canada); 2015.
Vijayashree R, Rao K. A semi-automated morphometric assessment of nuclei in pap smears using ImageJ. J Evolut Med Dental Sci. 2015; 4(53): 63–70.
Yu DY, Yu PK, Balaratnasingam C, et al. Microscopic structure of the retina and vasculature in the human eye. Microsc Sci Technol Applications Educ. 2010; 867: 875.
Türkyilmaz K, Öner V, Özkasap S, Şekeryapan B, Dereci S, Durmuş M. Peripapillary retinal nerve fiber layer thickness in children with iron deficiency anemia. Eur J Ophthalmol. 2013; 23(2): 217–222. [CrossRef] [PubMed]
Korkmaz MF, Can ME, Kazancı EG. Effects of iron deficiency anemia on peripapillary and macular vessel density determined using optical coherence tomography angiography on children. Graefes Arch Clin Exp Ophthalmol. 2020; 258: 2059–2068. [CrossRef] [PubMed]
Shao Z, Dorfman AL, Seshadri S, et al. Choroidal involution is a key component of oxygen-induced retinopathy. Invest Ophthalmol Vis Sci. 2011; 52(9): 6238–6248. [CrossRef] [PubMed]
Figure 1.
 
Network architecture for anemia and Hb concentration prediction. The process starts with the input of a retinal image, followed by preprocessing. Three separate convolutional neural networks (VGG16, ResNet50, and Inception V3) extract important visual features from the image through convolutional and max-pooling layers, generating feature maps (represented as small blue boxes in a 3 × 3 grid). These blue boxes correspond to the output of the convolutional layers, where key patterns from the retinal image are detected and compressed into a more compact representation. These features are then passed through dense layers, which integrate the image data with demographic information, such as age and gender, at specific concatenation points (marked “C”). The network produces two outputs: a continuous hemoglobin value and a binary classification for anemia detection. The color-coded arrows illustrate the data flow: red arrows for main pathways, green arrows for skip connections, and orange arrows for output paths.
Figure 1.
 
Network architecture for anemia and Hb concentration prediction. The process starts with the input of a retinal image, followed by preprocessing. Three separate convolutional neural networks (VGG16, ResNet50, and Inception V3) extract important visual features from the image through convolutional and max-pooling layers, generating feature maps (represented as small blue boxes in a 3 × 3 grid). These blue boxes correspond to the output of the convolutional layers, where key patterns from the retinal image are detected and compressed into a more compact representation. These features are then passed through dense layers, which integrate the image data with demographic information, such as age and gender, at specific concatenation points (marked “C”). The network produces two outputs: a continuous hemoglobin value and a binary classification for anemia detection. The color-coded arrows illustrate the data flow: red arrows for main pathways, green arrows for skip connections, and orange arrows for output paths.
Figure 2.
 
Illustration of the two zones (zone A and zone B) used for computing vessel parameters. Each zone is defined as a circular area cantered around the optic disc. Vessels are classified into arteries shown in blue, and veins shown in green.
Figure 2.
 
Illustration of the two zones (zone A and zone B) used for computing vessel parameters. Each zone is defined as a circular area cantered around the optic disc. Vessels are classified into arteries shown in blue, and veins shown in green.
Figure 3.
 
Image processing steps for vessel segmentation and analysis. (A) Original red, green, blue (RGB) fundus image. (B) Conversion of the image to 8-bit grayscale. (C) Enhancement of local contrast using contrast-limited adaptive histogram equalization (CLAHE). (D) Vessel binarization using the “Trainable Weka Segmentation” plugin in Fiji, where the software was trained to distinguish vessels from the background by manually drawing representative lines inside (red) and outside (green) the selected vessels. (E) Binary image showing vessels as white pixels on a black background. (F) The binarized vessel image processed with the “Skeletonize” plugin to convert vessels into thin tracks. (G) Identification of vessel branches (green) and branch nodes (orange), with measurements of the actual branch lengths and straight-line distances between the nodes. (H) Calculation of local vessel thickness using the “Geometry to Distance Map” plugin, along with vessel density and tortuosity parameters.
Figure 3.
 
Image processing steps for vessel segmentation and analysis. (A) Original red, green, blue (RGB) fundus image. (B) Conversion of the image to 8-bit grayscale. (C) Enhancement of local contrast using contrast-limited adaptive histogram equalization (CLAHE). (D) Vessel binarization using the “Trainable Weka Segmentation” plugin in Fiji, where the software was trained to distinguish vessels from the background by manually drawing representative lines inside (red) and outside (green) the selected vessels. (E) Binary image showing vessels as white pixels on a black background. (F) The binarized vessel image processed with the “Skeletonize” plugin to convert vessels into thin tracks. (G) Identification of vessel branches (green) and branch nodes (orange), with measurements of the actual branch lengths and straight-line distances between the nodes. (H) Calculation of local vessel thickness using the “Geometry to Distance Map” plugin, along with vessel density and tortuosity parameters.
Figure 4.
 
ROC curves for anemia prediction.
Figure 4.
 
ROC curves for anemia prediction.
Figure 5.
 
Model performance for estimating Hb concentration. Panels (A, B, C) show scatter diagrams where each circle represents the predicted versus the measured Hb value, with the black dashed line indicating the ideal model. Panels (D, E, F) present Bland–Altman plots illustrating the difference between predicted and measured Hb values against the measured values. In these plots, each dot represents the difference between predicted and measured values. The black line indicates the mean difference, the black dashed lines represent the 95% limits of agreement, and the red line shows the line of fit.
Figure 5.
 
Model performance for estimating Hb concentration. Panels (A, B, C) show scatter diagrams where each circle represents the predicted versus the measured Hb value, with the black dashed line indicating the ideal model. Panels (D, E, F) present Bland–Altman plots illustrating the difference between predicted and measured Hb values against the measured values. In these plots, each dot represents the difference between predicted and measured values. The black line indicates the mean difference, the black dashed lines represent the 95% limits of agreement, and the red line shows the line of fit.
Figure 6.
 
GradCAM visualization comparing anemic (A) and non-anemic images (B).
Figure 6.
 
GradCAM visualization comparing anemic (A) and non-anemic images (B).
Table 1.
 
Basic Characteristics of the Development Datasets and the Validation Dataset
Table 1.
 
Basic Characteristics of the Development Datasets and the Validation Dataset
Table 2.
 
Performance Metrics for Anemia Prediction
Table 2.
 
Performance Metrics for Anemia Prediction
Table 3.
 
Quantification of Vessel Parameters for Anemic and Non-Anemic Subjects
Table 3.
 
Quantification of Vessel Parameters for Anemic and Non-Anemic Subjects
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×