Open Access
Artificial Intelligence  |   July 2022
Automated Detection of Vascular Leakage in Fluorescein Angiography – A Proof of Concept
Author Affiliations & Notes
  • LeAnne H. Young
    National Eye Institute, Bethesda, MD, USA
    Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA
  • Jongwoo Kim
    National Library of Medicine, Bethesda, MD, USA
  • Mehmet Yakin
    National Eye Institute, Bethesda, MD, USA
  • Henry Lin
    National Eye Institute, Bethesda, MD, USA
  • David T. Dao
    National Eye Institute, Bethesda, MD, USA
  • Shilpa Kodati
    National Eye Institute, Bethesda, MD, USA
  • Sumit Sharma
    Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA
  • Aaron Y. Lee
    University of Washington, Seattle, WA, USA
  • Cecilia S. Lee
    University of Washington, Seattle, WA, USA
  • H. Nida Sen
    National Eye Institute, Bethesda, MD, USA
  • Correspondence: H. Nida Sen, Clinical and Translational Immunology Unit, and Uveitis and Ocular Immunology Service, National Eye Institute, National Institutes of Health, 10 Center Drive, Building 10, Room 10N248, Bethesda, MD 20892, USA. e-mail: senh@nei.nih.gov, nidasen@gmail.com 
Translational Vision Science & Technology July 2022, Vol.11, 19. doi:https://doi.org/10.1167/tvst.11.7.19
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      LeAnne H. Young, Jongwoo Kim, Mehmet Yakin, Henry Lin, David T. Dao, Shilpa Kodati, Sumit Sharma, Aaron Y. Lee, Cecilia S. Lee, H. Nida Sen; Automated Detection of Vascular Leakage in Fluorescein Angiography – A Proof of Concept. Trans. Vis. Sci. Tech. 2022;11(7):19. doi: https://doi.org/10.1167/tvst.11.7.19.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The purpose of this paper was to develop a deep learning algorithm to detect retinal vascular leakage (leakage) in fluorescein angiography (FA) of patients with uveitis and use the trained algorithm to determine clinically notable leakage changes.

Methods: An algorithm was trained and tested to detect leakage on a set of 200 FA images (61 patients) and evaluated on a separate 50-image test set (21 patients). The ground truth was leakage segmentation by two clinicians. The Dice Similarity Coefficient (DSC) was used to measure concordance.

Results: During training, the algorithm achieved a best average DSC of 0.572 (95% confidence interval [CI] = 0.548–0.596). The trained algorithm achieved a DSC of 0.563 (95% CI = 0.543–0.582) when tested on an additional set of 50 images. The trained algorithm was then used to detect leakage on pairs of FA images from longitudinal patient visits. Longitudinal leakage follow-up showed a >2.21% change in the visible retina area covered by leakage (as detected by the algorithm) had a sensitivity and specificity of 90% (area under the curve [AUC] = 0.95) of detecting a clinically notable change compared to the gold standard, an expert clinician's assessment.

Conclusions: This deep learning algorithm showed modest concordance in identifying vascular leakage compared to ground truth but was able to aid in identifying vascular FA leakage changes over time.

Translational Relevance: This is a proof-of-concept study that vascular leakage can be detected in a more standardized way and that tools can be developed to help clinicians more objectively compare vascular leakage between FAs.

Introduction
Uveitis is usually diagnosed clinically, however, it is known that there can sometimes be a mismatch between the clinical appearance of uveitis and fluorescein angiography (FA). In certain uveitis cases, FA is essential for the diagnosis and management of patients with uveitis due to its ability to display vascular leakage. Some patients may appear grossly quiescent on clinical examination but exhibit angiographic activity that may alter treatment decisions. Although FA is the gold standard for detecting vascular leakage, its interpretation is subject to significant variability between clinicians.1,2 
Artificial intelligence (AI) is a powerful tool to find patterns in large datasets, and its use in ophthalmological research has ballooned in recent years. There are AI systems to detect papilledema,3 diabetic retinopathy,4 retinopathy of prematurity,5 intraretinal fluid in optical coherence tomography6 (OCT), to predict glaucoma progression using Humphrey Visual Field testing,7 and to classify age-related macular degeneration severity in color fundus photographs.8 There are also algorithms developed to interpret FAs in diseases such as diabetic retinopathy,911 diabetic macular edema,1 and malarial retinopathy.12 However, no algorithms have specifically been developed to quantify vascular leakage in uveitis, although one system trained on diabetic retinopathy was used to detect vascular leakage on a single patient with retinal vasculitis.13 Segmenting vascular leakage in fluorescein angiograms of patients with uveitis is a difficult computer vision problem to solve, due to the considerable variability in anatomy, associated retinal lesions, vascular leakage patterns, and severity between patients. In addition, unlike color fundus photographs, Humphrey Visual Field testing, and OCT, the time-dependent component of vascular leakage – including the differential circulation of the dye in choroidal and retinal vasculature – poses a unique challenge. 
In this paper, we describe a proof-of-concept, the first of its kind, deep learning algorithm trained to segment vascular leakage on the FA of patients with uveitis. We quantify clinician variability in FA vascular leakage segmentation. Finally, we use this algorithm to aid in detecting a clinically notable change in leakage over time in FA images obtained from longitudinal patient visits. 
Methods
Subjects and Ground Truth Image Selection
FA images were obtained from the Uveitis/Intraocular Inflammatory Disease Biobank clinical research protocol and used for algorithm training. Eligible biobank participants have regularly scheduled follow-up visits and undergo clinical phenotyping, extensive multimodal imaging (including wide-field color photographs, fundus autofluorescence imaging, OCT, and FA), full-field electroretinogram, perimetry, and immunophenotyping. Multiple images of both eyes were captured on all modalities at each study visit. The prospective and longitudinal nature of this biobank allows images and data to be captured from patients before, during, and after treatment for uveitis. We used 200 images from the biobank for algorithm training with 5-fold cross validation and an additional test set of 50 images to further evaluate the trained algorithm. 
FA images taken after the 1-minute timepoint, with a clear view to the retina, and without excessive retinal lesions were included. All images were reviewed by two uveitis specialists to confirm they were of sufficient quality for clinician interpretation to allow for vascular leakage detection. Images were all taken on the Optos 200Tx platform from March 2016 to February 2020. 
The study had prior approval from the National Institutes of Health Institutional Review Board, complied with the Health Insurance Portability and Accountability Act of 1996, and followed the tenets of the Declaration of Helsinki (clinicaltrials.gov identifier NCT02656381). 
Ground Truth Image Segmentation
Two teams of two clinician graders (authors D.D., M.Y., H.L., and L.Y.) segmented (annotated) FA images for vascular leakage using Adobe Photoshop version 21 (San Jose, CA). Graders were provided color fundus photographs and several images from earlier and later phases of the FA to aid in accurate leakage segmentation. 
The primary grader on each team performed segmentation by outlining areas of vascular leakage in Adobe Photoshop using the pen tool. The secondary grader reviewed and edited the first grader's work. This approach was taken to compensate for grader fatigue. Difficult images were adjudicated with senior clinicians. Before beginning segmentation, all graders met to discuss the definition of leakage and agreed on a segmentation protocol defined by the senior clinician (author H.N.S.). Graders defined leakage as increased hyperfluorescence above the general choroidal background fluorescence, following the pattern of retinal blood vessels, and increasing in size compared to earlier-frame FA images. 
Graders first defined the boundary of gradable retina (excluding eyelashes, peripheral artifacts, and peripheral scars). Leakage due to choroidal neovascularization and optic nerve leakage were not segmented as leakage. Scars and lesions inside the boundary of gradable retina were not segmented as leakage and marked as such. Finally, anatomic structures in the retina, such as the optic nerve and the macula, were demarcated. In areas of leakage bordering large vessels, efforts were made to exclude the large vessel from the segmentation of vascular leakage. However, in some cases of diffuse leakage where it was difficult to exclude very small vessels, portions of the vessels were included in the segmentation. 
Algorithm Development
We adapted a U-Net14 architecture as a deep learning model for leakage segmentation, as U-Net is a fully convolutional neural network and its architecture generally demonstrates superior segmentation performance in the biomedical image segmentation literature. Further details regarding algorithm architecture and are shown in Figure 1
Figure 1.
 
Modified U-net architecture for the deep learning algorithm. The algorithm's inputs are 224 × 224 × 3 FA images and the outputs are 224 × 224 × 1 grayscale images. The model has a contracting path and an expanding path. The contracting path (left side of figure) consists of convolutional layers and max pooling layers. The expansive path (right side of figure) consists of up sampling of the feature map and convolutional layers. Two dropout layers (with a 0.5 dropout ratio) are added in the contracting path to train the model more robustly and to resolve overtraining issues. The two dropout layers use only 50% of randomly selected weights during training so that the deep learning algorithm does not depend on specific features but instead depends on all features equally. The final convolutional layer maps each feature vector to the desired classes. The model assigns a class label to each pixel as an output. Conv, convolution; MaxPool, Max Pooling.
Figure 1.
 
Modified U-net architecture for the deep learning algorithm. The algorithm's inputs are 224 × 224 × 3 FA images and the outputs are 224 × 224 × 1 grayscale images. The model has a contracting path and an expanding path. The contracting path (left side of figure) consists of convolutional layers and max pooling layers. The expansive path (right side of figure) consists of up sampling of the feature map and convolutional layers. Two dropout layers (with a 0.5 dropout ratio) are added in the contracting path to train the model more robustly and to resolve overtraining issues. The two dropout layers use only 50% of randomly selected weights during training so that the deep learning algorithm does not depend on specific features but instead depends on all features equally. The final convolutional layer maps each feature vector to the desired classes. The model assigns a class label to each pixel as an output. Conv, convolution; MaxPool, Max Pooling.
Image Processing
Images undergo processing before they are used for algorithm training. First, the original FA images are augmented by rotating them 0, 10, 20, −10, and −20 degrees. Each image is also rotated across the X-axis. The 5 rotations multiplied by the 2 X-axis rotations allow the training image set to be increased 10-fold. The clinician segmentations of vascular leakage from Adobe Photoshop were converted into images (“leakage segmentation images”) and also undergo this augmentation. Second, the augmented images are cropped into smaller images of w × w pixels (“windows”) by shifting the window horizontally or vertically. This step further increases the number of input training images to improve segmentation accuracy of the algorithm. Third, the FA images are adjusted using a contrast limited adaptive histogram equalization (CLAHE) operator. As some FA images have a brighter background whereas others have a darker background, the CLAHE operator improves the contrast and edges in the images based on local information. Last, pairs of cropped images (a cropped FA image and its corresponding cropped leakage segmentation image) are resized to a 224 × 224 × 3 pixel size and inputted into the deep learning algorithm for training (Fig. 2). 
Figure 2.
 
Workflow for training and testing the deep learning algorithm. In training workflow (top of figure), images undergo processing (several degrees of rotation, rotation across the X-axis, cropping into windows (window size = wxw and stride = w/2 pixels), and resizing to a 224 × 224 × 3 image (as described in the Methods section) before it becomes a series of input images. The resizing to 224 × 224 × 3 step is necessary for processing as the deep learning model architecture allocates a large amount of computer memory to training and testing compared to other machine learning models. The 224 × 224 × 3 size was the largest input size that we could process in our existing hardware. Finally, pairs of processed images (one cropped FA image and its corresponding cropped leakage segmentation image) are used as inputs to train the deep learning algorithm. The testing workflow (bottom of figure) consists of four steps. First, an input FA image is adjusted using CLAHE operator. Second, the image is cropped into window sizes of w × w pixels by shifting the window (stride = w/2 pixels, or half the window size) horizontally or vertically. Third, each cropped image is resized to 224 × 224 × 3 pixels and input into the trained deep learning model. The model outputs grayscale images, with white pixels representing vascular leakage detected by the algorithm. Finally, all the outputs are converted back to their original size and combined into a final image. As each pixel in the original image could have multiple outputs due to the windows, the maximum value of each pixel was chosen to generate the final segmentation output.
Figure 2.
 
Workflow for training and testing the deep learning algorithm. In training workflow (top of figure), images undergo processing (several degrees of rotation, rotation across the X-axis, cropping into windows (window size = wxw and stride = w/2 pixels), and resizing to a 224 × 224 × 3 image (as described in the Methods section) before it becomes a series of input images. The resizing to 224 × 224 × 3 step is necessary for processing as the deep learning model architecture allocates a large amount of computer memory to training and testing compared to other machine learning models. The 224 × 224 × 3 size was the largest input size that we could process in our existing hardware. Finally, pairs of processed images (one cropped FA image and its corresponding cropped leakage segmentation image) are used as inputs to train the deep learning algorithm. The testing workflow (bottom of figure) consists of four steps. First, an input FA image is adjusted using CLAHE operator. Second, the image is cropped into window sizes of w × w pixels by shifting the window (stride = w/2 pixels, or half the window size) horizontally or vertically. Third, each cropped image is resized to 224 × 224 × 3 pixels and input into the trained deep learning model. The model outputs grayscale images, with white pixels representing vascular leakage detected by the algorithm. Finally, all the outputs are converted back to their original size and combined into a final image. As each pixel in the original image could have multiple outputs due to the windows, the maximum value of each pixel was chosen to generate the final segmentation output.
Algorithm Training and Testing
A variety of algorithm parameters such as window size (672 × 672, 1334 × 1334, or 1792 × 1792 pixels), epochs (20, 50, 100, or 200), loss function (binary cross entropy or dice coefficient loss functions), and image enhancement (CLAHE or no enhancement) were tested to determine the best-performing algorithm. Learning curves for the final algorithm are shown in Supplementary Figure S1. We used the aforementioned fixed image augmentation strategies (i.e. choosing the 0/10/20/-10/-20 degree rotation angles instead of randomly generated angles) in order to allow for standardized algorithm-to-algorithm comparisons. 
After initial algorithm training/testing on the 200-image set was complete, the final deep learning algorithm was evaluated again on a separate test set of 50 FA images. The algorithm's segmentation outputs were compared to the ground truth clinician segmentation using the dice similarity coefficient (DSC). 
Statistical Analysis
We used the DSC to quantify concordance between algorithm leakage segmentation and the ground truth. The DSC is defined as the size of the intersection of two sets divided by their average size15 and it ranges from 0 to 1, 0 indicating no spatial overlap between 2 sets of segmentation results, and 1 indicating perfect overlap.16 
Determination of Clinically Notable Change in Vascular Leakage
The trained algorithm was used to help determine clinically notable change in vascular leakage in FAs taken across different visits. We first identified an additional 20 patient eyes. Each eye had FAs taken at two different visits, with a clinical change between visits (such as a treatment intervention or change in disease activity). We chose images from each FA pair within 45 seconds of each other. For example, if the visit 1 FA image was from the 3-minute timepoint, a visit 2 FA image from 2 minutes 15 seconds to 3 minutes 45 seconds was chosen. A senior uveitis clinician (author H.N.S.) assessed the FA image pairs to determine if there was clinically significant change between the vascular leakage in the images between the visits. 
We then used the trained algorithm to segment vascular leakage on each FA image from the two visits. We calculated the percentage of the visible retinal area that was covered by the algorithm's leakage segmentation. Then, we calculated the change in this percentage between visits. A receiver operating characteristic (ROC) curve analysis was performed to determine a cutoff of percent change in algorithm-detected vascular leakage between visits that could differentiate “clinically notable change in vascular leakage” from “no clinically notable change in vascular leakage.” The senior uveitis clinician's determination of yes/no clinically notable change in vascular leakage was used as the gold standard. The sensitivity and specificity with 95% confidence interval (CI) were reported for the percent change in vascular leakage across visits for the value that maximized sensitivity and specificity, and the area under the curve (AUC) was calculated. 
Quantification of Grader Variability in FA Leakage Segmentation
Two uveitis fellowship-trained graders both segmented the same set of 40 images and their segmentations were compared to each other using the DSC. 
Software and Hardware
We implemented the deep learning algorithm using Python with Keras on a Red Hat Enterprise Linux 7 operating system. The hardware configuration used for this study were a single Intel(R) Xeon(R) CPU E3-1275 version 6 @3.80GHz 4 cores with hyperthreading, total 8 processors, and a GTX 1080 Ti GPU. 
Results
Patient and Fluorescein Angiogram Image Characteristics
The 200 images in the training set came from 61 patients, with a median of 2 images used per patient (interquartile range [IQR] = 1–4 images). One hundred forty of 200 (60%) of the images were of right eyes. Images were obtained from March 2016 to December 2019. Most patients had posterior segment uveitis (Table 1, Supplementary Table S1). The FA images were from an average timepoint of 361 seconds (SD = 174 seconds). 
Table 1.
 
Patient Disease Characteristics by Uveitis Anatomic Location
Table 1.
 
Patient Disease Characteristics by Uveitis Anatomic Location
Quantification of Grader Variability in FA Leakage Segmentation
The average DSC between the 2 graders’ segmentations was 0.483 (95% CI = 0.439–0.528). Figure 3 depicts examples of good and poor grader concordance. 
Figure 3.
 
Example of good concordance (A) DSC 0.642 and poor concordance (B) DSC 0.095 between two clinician graders’ vascular leakage segmentations.
Figure 3.
 
Example of good concordance (A) DSC 0.642 and poor concordance (B) DSC 0.095 between two clinician graders’ vascular leakage segmentations.
Algorithm Training and Testing Results
We initially achieved a best average DSC of 0.501 between the algorithm segmentation and ground truth when all 200 images were used for algorithm training. After the inter-grader concordance results were available and showed poor concordance, we hypothesized the differences in segmentation style between the two clinician teams confounded the deep learning algorithm. Because one team segmented 112 images whereas the other team segmented 88 images, we retrained and tested the algorithm on the 112 images using the 5-fold cross validation method. This retraining allowed us to achieve a maximum average DSC of 0.572 (95% CI = 0.548–0.596), using a window size of 672 × 672 pixels, the Dice Coefficient loss function, CLAHE image enhancement, and 200 epochs (Supplementary Table S2). Examples of algorithm performance compared to the manual segmentation ground truth are provided in Figure 4. Finally, testing the final trained algorithm on a separate set of 50 FA images resulted in an average DSC of 0.563 (95% CI = 0.543–0.582). 
Figure 4.
 
Example of good algorithm concordance with the ground truth (A) DSC 0.718 and (B), DSC 0.750. Examples of poor algorithm concordance with the ground truth (C) DSC 0.482 and (D) DSC 0.263.
Figure 4.
 
Example of good algorithm concordance with the ground truth (A) DSC 0.718 and (B), DSC 0.750. Examples of poor algorithm concordance with the ground truth (C) DSC 0.482 and (D) DSC 0.263.
Algorithm-Assisted Determination of Clinically Notable Change in Vascular Leakage
We used the algorithm to help determine if there was a clinically notable change over time in vascular leakage across pairs of FA images taken from two visits. The algorithm was used to detect vascular leakage on a pair of images. We found that if there was a change of greater than 2.21% of the visible retinal area covered by the algorithm leakage segmentation (127,520 pixels), we were able to correctly classify image pairs as having “clinically notable change” with 90% sensitivity and specificity, with an AUC of 0.95 (Fig. 5Fig. 6Table 2). For reference, the average optic disc area in our dataset was 10,020 pixels. 
Figure 5.
 
(A) Depicts a patient's FA at their first visit. (B) depicts the same patient's FA at their second visit 2 months later. An expert uveitis specialist identified the patient as having clinically notable improvement in the second FA compared to the first. The algorithm was used to segment vascular leakage on both images, and the percent of the visible retinal area in each image covered by the algorithm's leakage segmentation was calculated. In this example, the second image had a 2.33% decrease in the visible retinal area covered by algorithm segmentation of vascular leakage compared to the first visit image (134,610 fewer pixels of vascular leakage). An ROC curve analysis showed a change of greater than 2.21% of the visible retinal area covered by the algorithm's vascular leakage segmentation (a change of greater than 127,520 vascular leakage pixels) could differentiate between yes/no clinically notable change in vascular leakage with 90% sensitivity and specificity. For comparison, the average optic disc area in our dataset was 10,020 pixels.
Figure 5.
 
(A) Depicts a patient's FA at their first visit. (B) depicts the same patient's FA at their second visit 2 months later. An expert uveitis specialist identified the patient as having clinically notable improvement in the second FA compared to the first. The algorithm was used to segment vascular leakage on both images, and the percent of the visible retinal area in each image covered by the algorithm's leakage segmentation was calculated. In this example, the second image had a 2.33% decrease in the visible retinal area covered by algorithm segmentation of vascular leakage compared to the first visit image (134,610 fewer pixels of vascular leakage). An ROC curve analysis showed a change of greater than 2.21% of the visible retinal area covered by the algorithm's vascular leakage segmentation (a change of greater than 127,520 vascular leakage pixels) could differentiate between yes/no clinically notable change in vascular leakage with 90% sensitivity and specificity. For comparison, the average optic disc area in our dataset was 10,020 pixels.
Figure 6.
 
Reciever operating characteristic curve of various thresholds (percent change in the visible retinal area covered by algorithm segmentation of vascular leakage between two visits). The gold standard was a uveitis expert's assessment of the yes/no clinicially significant change in vascular leakage between visits. The area under the curve (AUC) was 0.95.
Figure 6.
 
Reciever operating characteristic curve of various thresholds (percent change in the visible retinal area covered by algorithm segmentation of vascular leakage between two visits). The gold standard was a uveitis expert's assessment of the yes/no clinicially significant change in vascular leakage between visits. The area under the curve (AUC) was 0.95.
Table 2.
 
ROC Table of Percent Change in the Visible Retinal Area in the FA Image Covered With Algorithm Segmentation of Vascular Leakage Differentiating the Presence or Absence of a Clinically Significant Change in Vascular Leakage
Table 2.
 
ROC Table of Percent Change in the Visible Retinal Area in the FA Image Covered With Algorithm Segmentation of Vascular Leakage Differentiating the Presence or Absence of a Clinically Significant Change in Vascular Leakage
Conclusions/Discussion
In this study, we identified variability between clinician segmentation of FA images, developed a preliminary deep learning algorithm that was fairly able to segment vascular leakage on FA images despite having an ambiguous ground truth, and successfully used it to help determine clinically notable changes in vascular leakage. As FA is frequently used to diagnose uveitis, assess a patient's response to treatment, and to aid in clinical decision making,17 quantifying vascular leakage on patients with uveitis FAs is critical. A tool to standardize the quantification of vascular leakage would thus be very useful in the clinical and research settings. 
We initially hypothesized the algorithm could achieve a best average DSC of 0.7 or greater when its performance was compared to the ground truth. However, the best average DSC we were able to achieve was 0.572. In the brain and lung imaging segmentation literature, groups report achieving average DSCs of 0.9 and higher.15,18,19 However, because the calculation for DSC is highly dependent on the area of interest,15,16 it becomes more difficult to achieve higher DSCs on smaller structures such as the retina. As examples, Deeley et al.20 reported average DSCs of 0.4 to 0.5 when segmenting nerves and the optic chiasm on brain magnetic resonance imaging (MRI) scans, whereas Liefers et al.21 reported a mean DSC of 0.6 when developing an algorithm to segment features of age-related macular degeneration on OCT. Additionally, our segmentation was more challenging as the leakage boundary was more ambiguous compared to boundaries for other forms of imaging. 
Another difficulty with using the DSC was noted when we reviewed individual algorithm segmentation results. We noted that in several cases, the algorithm segmentations were comparable to the ground truth, however the DSC was very poor (see Fig. 4D). These images tended to have very small amounts of vascular leakage. Because the DSC is exquisitely dependent on the size of the area of interest, if the algorithm segmented even a few pixels inaccurately, the DSC would decrease dramatically. 
An additional challenge is the time-dependent nature of vascular leakage. Our algorithm was only able to evaluate one image from an FA series, whereas a clinician would compare an image to previous and later frames. Because the algorithm did not have previous images available for comparison, in some cases, the algorithm detected leakage that was not actually present or was due to background choroidal fluorescence. Additionally, as FA flow rates differ, it is nearly impossible to capture exact and repeatable timing for image capturing and consecutive labeling. However, in the clinical setting using current commercially available equipment, it is also impossible to capture these exact, repeatable timings. It should also be noted that whereas the algorithm performance was less than initially expected (DSC of 0.572), the agreement between the algorithm and ground truth was superior to the agreement between two graders (DSC of 0.483). 
Although the algorithm requires further refinement, it already demonstrates superior speed in segmentation compared to manual segmentation. When using 672 × 672 pixel window sizes, the deep learning algorithm took an average of 4.7 seconds to process an FA image. Additionally, as the trained algorithm performed well in classifying clinically notable (or insignificant) change in leakage over time, this could also prove useful in clinical practice or in clinical trials. 
Strengths of this study include the variety of uveitis etiologies (see Supplementary Table S1) and vascular leakage patterns and the prospectively collected uveitis biobank database. Limitations of this study include the relatively small dataset, which we tried to overcome with image augmentation. Unfortunately, obtaining large sample sizes for rare diseases such as uveitis is challenging, and future directions should include multicenter collaborations. Another limitation was variability between graders even in the ground truth. We attempted to overcome this problem by meeting beforehand to discuss a standard definition of vascular leakage. We also used a team-based approach where one clinician double checked the other clinician's segmentation. However, even with these attempts to standardize manual segmentation, one team tended to over-segment vascular leakage whereas the other team tended to under-segment. The discrepancy between teams led to decreased uniformity in the training set and thus, decreased algorithm performance. However, we do note that even between expert human graders, we measured considerable, and higher, inter-rater reliability. Others have also observed intra-grader variability in FA1 and more easily interpretable images such as OCT.21 This discordance between clinicians could have serious impacts on patient care, and thus a more objective tool to evaluate FAs has great clinical importance. 
Future directions include validation on external datasets, as is testing the performance on more challenging data (for example, FAs with extensive lesions). Further work on determining a threshold for algorithm detection of clinically notable change in vascular leakage would be useful – for example, using FA images obtained before and after patients with uveitis receive treatment. There is a great clinical need for a tool to objectively evaluate FAs in uveitis, and this project represents the beginning of using machine learning to detect vascular leakage on FAs of patients with uveitis. 
Acknowledgments
Supported by the NIH Intramural Research Program. This research/work was supported in part by the Lister Hill National Center for Biomedical Communications of the National Library of Medicine (NLM), National Institutes of Health. This research was also made possible through the NIH Medical Research Scholars Program, a public-private partnership supported jointly by the NIH and contributions to the Foundation for the NIH from the Doris Duke Charitable Foundation, Genentech, the American Association for Dental Research, the Colgate-Palmolive Company, and other private donors. 
Disclosure: L.H. Young, None; J. Kim, None; M. Yakin, None; H. Lin, None; D.T. Dao, None; S. Kodati, None; S. Sharma, None; A.Y. Lee, None; C.S. Lee, None; H.N. Sen, None 
References
Rabbani H, Allingham MJ, Mettu PS, Cousins SW, Farsiu S. Fully Automatic Segmentation of Fluorescein Leakage in Subjects With Diabetic Macular Edema. Invest Ophthalmol Vis Sci. 2015; 56(3): 1482–1492. [CrossRef] [PubMed]
Kaiser RS, Berger JW, Williams GA, et al. Variability in fluorescein angiography interpretation for photodynamic therapy in age-related macular degeneration. Retina. 2002; 22(6): 683–690. [CrossRef] [PubMed]
Milea D, Najjar RP, Jiang Z, et al. Artificial Intelligence to Detect Papilledema from Ocular Fundus Photographs. N Engl J Med. 2020; 382(18): 1687–1695. [CrossRef] [PubMed]
Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018; 1(1): 39. [CrossRef] [PubMed]
Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. In: JAMA Ophthalmology. Vol. 136. Chicago, IL: American Medical Association; 2018: 803–810.
Lee CS, Tyring AJ, Deruyter NP, Wu Y, Rokem A, Lee AY. Deep-learning based, automated segmentation of macular edema in optical coherence tomography. Biomed Opt Express. 2017; 8(7): 3440. [CrossRef] [PubMed]
Wen JC, Lee CS, Keane PA, et al. Forecasting future Humphrey Visual Fields using deep learning. Vavvas DG, ed. PLoS One. 2019; 14(4): e0214875. [CrossRef] [PubMed]
Peng Y, Dharssi S, Chen Q, et al. DeepSeeNet: A Deep Learning Model for Automated Classification of Patient-based Age-related Macular Degeneration Severity from Color Fundus Photographs. Ophthalmology. 2019; 126(4): 565–575. [CrossRef] [PubMed]
Ehlers JP, Wang K, Vasanji A, Hu M, Srivastava SK. Automated quantitative characterisation of retinal vascular leakage and microaneurysms in ultra-widefield fluorescein angiography. Br J Ophthalmol. 2017; 101(6): 696–699. [CrossRef] [PubMed]
Zheng Y, Gandhi JS, Stangos AN, Campa C, Broadbent DM, Harding SP. Automated Segmentation of Foveal Avascular Zone in Fundus Fluorescein Angiography. Investig Opthalmology Vis Sci. 2010; 51(7): 3653. [CrossRef]
Son G, Kim YJ, Sung YS, Park B, Kim J-G. Analysis of quantitative correlations between microaneurysm, ischaemic index and new vessels in ultrawide-field fluorescein angiography images using automated software. Br J Ophthalmol. 2019; 103(12): 1759–1764. [PubMed]
Zhao Y, Wilmarth PA, Cheng C, et al. Proteome-transcriptome analysis and proteome remodeling in mouse lens epithelium and fibers. Exp Eye Res. 2019; 179: 32–46. [CrossRef] [PubMed]
Venkat AG, Sharma S. Automated measurement of leakage on wide-field angiography in the assessment of retinal vasculitis. J Ophthalmic Inflamm Infect. 2020; 10(1): 4. [CrossRef] [PubMed]
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9351. London, UK: Springer Verlag; 2015: 234–241.
Shattuck DW, Prasad G, Mirza M, Narr KL, Toga AW. Online resource for validation of brain segmentation methods. Neuroimage. 2009; 45(2): 431–439. [CrossRef] [PubMed]
Zou KH, Warfield SK, Bharatha A, et al. Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol. 2004; 11(2): 178–189. [CrossRef] [PubMed]
Vitale AT, Batra NN. Fluorescein angiography in the diagnosis and management of uveitis. In: Sen HN, Read RW, eds. Multimodal Imaging in Uveitis. New York, NY: Springer International Publishing; 2018: 1–24.
Souza JC, Bandeira Diniz JO, Ferreira JL, França da Silva GL, Corrêa Silva A, de Paiva AC. An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks. Comput Methods Programs Biomed. 2019; 177: 285–296. [CrossRef] [PubMed]
Park J, Yun J, Kim N, et al. Fully Automated Lung Lobe Segmentation in Volumetric Chest CT with 3D U-Net: Validation with Intra- and Extra-Datasets. J Digit Imaging. 2020; 33(1): 221–230. [CrossRef] [PubMed]
Deeley MA, Chen A, Datteri R, et al. Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions: a multi-expert study. Phys Med Biol. 2011; 56(14): 4557–4577. [CrossRef] [PubMed]
Liefers B, Taylor P, Alsaedi A, et al. Quantification of Key Retinal Features in Early and Late Age-Related Macular Degeneration Using Deep Learning. Am J Ophthalmol. 2021; 226: 1–12. [CrossRef] [PubMed]
Figure 1.
 
Modified U-net architecture for the deep learning algorithm. The algorithm's inputs are 224 × 224 × 3 FA images and the outputs are 224 × 224 × 1 grayscale images. The model has a contracting path and an expanding path. The contracting path (left side of figure) consists of convolutional layers and max pooling layers. The expansive path (right side of figure) consists of up sampling of the feature map and convolutional layers. Two dropout layers (with a 0.5 dropout ratio) are added in the contracting path to train the model more robustly and to resolve overtraining issues. The two dropout layers use only 50% of randomly selected weights during training so that the deep learning algorithm does not depend on specific features but instead depends on all features equally. The final convolutional layer maps each feature vector to the desired classes. The model assigns a class label to each pixel as an output. Conv, convolution; MaxPool, Max Pooling.
Figure 1.
 
Modified U-net architecture for the deep learning algorithm. The algorithm's inputs are 224 × 224 × 3 FA images and the outputs are 224 × 224 × 1 grayscale images. The model has a contracting path and an expanding path. The contracting path (left side of figure) consists of convolutional layers and max pooling layers. The expansive path (right side of figure) consists of up sampling of the feature map and convolutional layers. Two dropout layers (with a 0.5 dropout ratio) are added in the contracting path to train the model more robustly and to resolve overtraining issues. The two dropout layers use only 50% of randomly selected weights during training so that the deep learning algorithm does not depend on specific features but instead depends on all features equally. The final convolutional layer maps each feature vector to the desired classes. The model assigns a class label to each pixel as an output. Conv, convolution; MaxPool, Max Pooling.
Figure 2.
 
Workflow for training and testing the deep learning algorithm. In training workflow (top of figure), images undergo processing (several degrees of rotation, rotation across the X-axis, cropping into windows (window size = wxw and stride = w/2 pixels), and resizing to a 224 × 224 × 3 image (as described in the Methods section) before it becomes a series of input images. The resizing to 224 × 224 × 3 step is necessary for processing as the deep learning model architecture allocates a large amount of computer memory to training and testing compared to other machine learning models. The 224 × 224 × 3 size was the largest input size that we could process in our existing hardware. Finally, pairs of processed images (one cropped FA image and its corresponding cropped leakage segmentation image) are used as inputs to train the deep learning algorithm. The testing workflow (bottom of figure) consists of four steps. First, an input FA image is adjusted using CLAHE operator. Second, the image is cropped into window sizes of w × w pixels by shifting the window (stride = w/2 pixels, or half the window size) horizontally or vertically. Third, each cropped image is resized to 224 × 224 × 3 pixels and input into the trained deep learning model. The model outputs grayscale images, with white pixels representing vascular leakage detected by the algorithm. Finally, all the outputs are converted back to their original size and combined into a final image. As each pixel in the original image could have multiple outputs due to the windows, the maximum value of each pixel was chosen to generate the final segmentation output.
Figure 2.
 
Workflow for training and testing the deep learning algorithm. In training workflow (top of figure), images undergo processing (several degrees of rotation, rotation across the X-axis, cropping into windows (window size = wxw and stride = w/2 pixels), and resizing to a 224 × 224 × 3 image (as described in the Methods section) before it becomes a series of input images. The resizing to 224 × 224 × 3 step is necessary for processing as the deep learning model architecture allocates a large amount of computer memory to training and testing compared to other machine learning models. The 224 × 224 × 3 size was the largest input size that we could process in our existing hardware. Finally, pairs of processed images (one cropped FA image and its corresponding cropped leakage segmentation image) are used as inputs to train the deep learning algorithm. The testing workflow (bottom of figure) consists of four steps. First, an input FA image is adjusted using CLAHE operator. Second, the image is cropped into window sizes of w × w pixels by shifting the window (stride = w/2 pixels, or half the window size) horizontally or vertically. Third, each cropped image is resized to 224 × 224 × 3 pixels and input into the trained deep learning model. The model outputs grayscale images, with white pixels representing vascular leakage detected by the algorithm. Finally, all the outputs are converted back to their original size and combined into a final image. As each pixel in the original image could have multiple outputs due to the windows, the maximum value of each pixel was chosen to generate the final segmentation output.
Figure 3.
 
Example of good concordance (A) DSC 0.642 and poor concordance (B) DSC 0.095 between two clinician graders’ vascular leakage segmentations.
Figure 3.
 
Example of good concordance (A) DSC 0.642 and poor concordance (B) DSC 0.095 between two clinician graders’ vascular leakage segmentations.
Figure 4.
 
Example of good algorithm concordance with the ground truth (A) DSC 0.718 and (B), DSC 0.750. Examples of poor algorithm concordance with the ground truth (C) DSC 0.482 and (D) DSC 0.263.
Figure 4.
 
Example of good algorithm concordance with the ground truth (A) DSC 0.718 and (B), DSC 0.750. Examples of poor algorithm concordance with the ground truth (C) DSC 0.482 and (D) DSC 0.263.
Figure 5.
 
(A) Depicts a patient's FA at their first visit. (B) depicts the same patient's FA at their second visit 2 months later. An expert uveitis specialist identified the patient as having clinically notable improvement in the second FA compared to the first. The algorithm was used to segment vascular leakage on both images, and the percent of the visible retinal area in each image covered by the algorithm's leakage segmentation was calculated. In this example, the second image had a 2.33% decrease in the visible retinal area covered by algorithm segmentation of vascular leakage compared to the first visit image (134,610 fewer pixels of vascular leakage). An ROC curve analysis showed a change of greater than 2.21% of the visible retinal area covered by the algorithm's vascular leakage segmentation (a change of greater than 127,520 vascular leakage pixels) could differentiate between yes/no clinically notable change in vascular leakage with 90% sensitivity and specificity. For comparison, the average optic disc area in our dataset was 10,020 pixels.
Figure 5.
 
(A) Depicts a patient's FA at their first visit. (B) depicts the same patient's FA at their second visit 2 months later. An expert uveitis specialist identified the patient as having clinically notable improvement in the second FA compared to the first. The algorithm was used to segment vascular leakage on both images, and the percent of the visible retinal area in each image covered by the algorithm's leakage segmentation was calculated. In this example, the second image had a 2.33% decrease in the visible retinal area covered by algorithm segmentation of vascular leakage compared to the first visit image (134,610 fewer pixels of vascular leakage). An ROC curve analysis showed a change of greater than 2.21% of the visible retinal area covered by the algorithm's vascular leakage segmentation (a change of greater than 127,520 vascular leakage pixels) could differentiate between yes/no clinically notable change in vascular leakage with 90% sensitivity and specificity. For comparison, the average optic disc area in our dataset was 10,020 pixels.
Figure 6.
 
Reciever operating characteristic curve of various thresholds (percent change in the visible retinal area covered by algorithm segmentation of vascular leakage between two visits). The gold standard was a uveitis expert's assessment of the yes/no clinicially significant change in vascular leakage between visits. The area under the curve (AUC) was 0.95.
Figure 6.
 
Reciever operating characteristic curve of various thresholds (percent change in the visible retinal area covered by algorithm segmentation of vascular leakage between two visits). The gold standard was a uveitis expert's assessment of the yes/no clinicially significant change in vascular leakage between visits. The area under the curve (AUC) was 0.95.
Table 1.
 
Patient Disease Characteristics by Uveitis Anatomic Location
Table 1.
 
Patient Disease Characteristics by Uveitis Anatomic Location
Table 2.
 
ROC Table of Percent Change in the Visible Retinal Area in the FA Image Covered With Algorithm Segmentation of Vascular Leakage Differentiating the Presence or Absence of a Clinically Significant Change in Vascular Leakage
Table 2.
 
ROC Table of Percent Change in the Visible Retinal Area in the FA Image Covered With Algorithm Segmentation of Vascular Leakage Differentiating the Presence or Absence of a Clinically Significant Change in Vascular Leakage
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×