January 2018
Volume 7, Issue 1
Open Access
Articles  |   January 2018
Beyond Retinal Layers: A Deep Voting Model for Automated Geographic Atrophy Segmentation in SD-OCT Images
Author Affiliations & Notes
  • Zexuan Ji
    School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
  • Qiang Chen
    School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
  • Sijie Niu
    School of Information Science and Engineering, University of Jinan, Jinan, China
  • Theodore Leng
    Byers Eye Institute at Stanford, Stanford University School of Medicine, Palo Alto, CA, USA
  • Daniel L. Rubin
    Department of Radiology, Stanford University, Stanford, CA, USA
    Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA
  • Correspondence: Qiang Chen, Professor, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China. e-mail: [email protected] 
Translational Vision Science & Technology January 2018, Vol.7, 1. doi:https://doi.org/10.1167/tvst.7.1.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zexuan Ji, Qiang Chen, Sijie Niu, Theodore Leng, Daniel L. Rubin; Beyond Retinal Layers: A Deep Voting Model for Automated Geographic Atrophy Segmentation in SD-OCT Images. Trans. Vis. Sci. Tech. 2018;7(1):1. https://doi.org/10.1167/tvst.7.1.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To automatically and accurately segment geographic atrophy (GA) in spectral-domain optical coherence tomography (SD-OCT) images by constructing a voting system with deep neural networks without the use of retinal layer segmentation.

Methods: An automatic GA segmentation method for SD-OCT images based on the deep network was constructed. The structure of the deep network was composed of five layers, including one input layer, three hidden layers, and one output layer. During the training phase, the labeled A-scans with 1024 features were directly fed into the network as the input layer to obtain the deep representations. Then a soft-max classifier was trained to determine the label of each individual pixel. Finally, a voting decision strategy was used to refine the segmentation results among 10 trained models.

Results: Two image data sets with GA were used to evaluate the model. For the first dataset, our algorithm obtained a mean overlap ratio (OR) 86.94% ± 8.75%, absolute area difference (AAD) 11.49% ± 11.50%, and correlation coefficients (CC) 0.9857; for the second dataset, the mean OR, AAD, and CC of the proposed method were 81.66% ± 10.93%, 8.30% ± 9.09%, and 0.9952, respectively. The proposed algorithm was capable of improving over 5% and 10% segmentation accuracy, respectively, when compared with several state-of-the-art algorithms on two data sets.

Conclusions: Without retinal layer segmentation, the proposed algorithm could produce higher segmentation accuracy and was more stable when compared with state-of-the-art methods that relied on retinal layer segmentation results. Our model may provide reliable GA segmentations from SD-OCT images and be useful in the clinical diagnosis of advanced nonexudative AMD.

Translational Relevance: Based on the deep neural networks, this study presents an accurate GA segmentation method for SD-OCT images without using any retinal layer segmentation results, and may contribute to improved understanding of advanced nonexudative AMD.

Introduction
As a chronic disease, age-related macular degeneration (AMD) is the leading cause of irreversible vision loss among elderly individuals, which is generally accompanied with various phenotypic manifestations.1 The advanced stage of nonexudative AMD is generally characterized by geographic atrophy (GA) that is mainly characterized by atrophy of the retinal pigment epithelium (RPE).2 In the comparison of AMD treatments trial, the development of GA was one of the major causes for sustained visual acuity loss,3 which is generally associated with retinal thinning and loss of RPE and photoreceptors.4 A recent review article notes that the reduction in the worsening of atrophy is an important biomarker for assessing the effectiveness of a given GA treatment.5 Thus, automatic detection and characterization of retinal regions affected by GA is a fundamental and important step for clinical diagnosis, which could aid ophthalmologists in objectively measuring the regions of GA and monitor the evolution of AMD to further make treatment decisions.6,7 GA characterization generally requires accurate segmentation. Manual segmentation is time consuming and subject to interrater variability, which may not produce reliable results especially for large data sets. Therefore, automatic, accurate, and reliable segmentation technologies are urgently needed to advanced care in AMD. 
To the best of our knowledge, most semiautomated or automated image analysis methods to identify GA are applied to color fundus photographs, fundus autofluorescence (FAF), or optical coherence tomography (OCT) modalities.8 Semiautomatic and automatic segmentation GA segmentation methods applied to these modalities can generally produce useful results and have been found to agree with manually drawn gold standards. 
Color fundus photographs have been widely used for measuring GA lesions, where GA is characterized by a strongly demarcated area.9 However, the performance of most methods mainly depends on the quality of the color fundus images. GA lesions can be easily identified in high-quality color images, while the boundaries may be more difficult to be identified in lower quality images. 
As a noninvasive imaging technique for the ocular fundus, FAF can provide two-dimensional (2D) images with high contrast for the identification of GA. Both semiautomated and automated methods have been proposed for the segmentation of GA in FAF images. C. Panthier et al.10 proposed a semiautomated image processing approach for the identification and quantification of GA on FAF images, and constructed a commercial package (i.e,. Region Finder software), which was widely used for the evaluation of GA in clinical setting. The interactive approaches including level sets,11 watershed,12 and region growing13 have also been used in GA segmentation of FAF images. Meanwhile, the supervised classification methods14 and clustering technologies15 are widely used to automatically segment GA lesions in FAF images. 
Compared with fundus imaging, spectral-domain (SD) OCT imaging technology can obtain the axial differentiation of retinal structures and additional characterization of GA.16 Unlike the planar images provided by fundus modalities, SD-OCT can generate three-dimensional (3D) cubes composed of a set of 2D images (i.e., B-scans), and provide more detailed imaging characteristics of disease phenotypes.17,18 Because GA is generally associated with retinal thinning and loss of the RPE and photoreceptors, earlier works mainly focused on the thickness measurement of RPE, which could be further used as the biomarkers of GA lesions.19 However, segmenting GA is not as straightforward as solely detecting RPE. To directly identify GA lesions by characterizing RPE, state-of-the-art algorithms principally segment the GA regions based on the projection image generated with the voxels between the RPE and the choroid layers.2023 Chen et al.20 used geometric active contours to produce a satisfactory performance when compared with manually defined GA regions. A level set approach was developed to segment GA regions in both SD-OCT and FAF images.21 However, the performance of these models were generally dependent on the initializations. To further improve the segmentation accuracy and robustness to initializations, Niu et al.22 proposed an automated GA segmentation method for SD-OCT images by using a Chan-Vese model via local similarity factor, and then used this segmentation algorithm to automatically predict the growth of GA.23 However, as mentioned above, GA is generally associated with retinal thinning and loss of RPE and photoreceptors, and state-of-the-art algorithms mainly segment GA based on the projection image generated with the voxels between the RPE and the choroid layers, implying that these methods rely on the accuracy of retinal layer segmentation. 
Recently, deep learning has gained significant success and obtained outstanding performance in many computer vision applications.24 Much attention has been drawn to the field of computational medical imaging to investigate the potential of deep learning in medical imaging applications,25 including medical image segmentation,26 registration,27 multimodal fusion,28 diagnosis,29 disease detection,30 and so on. For ophthalmology applications, deep learning has also recently been applied to automated detection of diabetic retinopathy from fundus photos,31 visual field perimetry in glaucoma patients,32 grading of nuclear cataracts,33 segmentation of foveal microvasculature,34 AMD classification,35 and identification of diabetic retinopathy.36 Here, we use deep leaning methods to automatically discover the representations and structures inside OCT data in order to segment GA. To our best knowledge, we are the first to segment the GA lesions from OCT images with deep learning. 
A deep voting model is proposed for automated GA segmentation of SD-OCT images, which is capable of achieving high segmentation accuracy without using any retinal layer segmentation results. A deep network is constructed to capture deep representations of the data, which contains five layers including one input layer, three hidden layers (sparse autoencoders; SA), and one output layer. During the training phase, the randomly selected labeled A-scans with 1024 features are directly fed into the network as the input layer to obtain the deep representations. Then a soft-max classifier is trained to determine the label of each individual pixel. Finally, a voting decision strategy is used to refine the segmentation results among ten trained models. Without retinal layer segmentation, the proposed algorithm can obtain higher segmentation accuracy and is more stable compared with the state-of-the-art methods that rely on the retinal layer segmentation results. Our method can provide reliable GA segmentations from SD-OCT images and be useful for evaluating advanced nonexudative AMD. 
Methods
Experimental Data Characteristics
Two different data sets acquired with a Cirrus OCT device (Carl Zeiss Meditec, Inc., Dublin, CA) were used to evaluate the performance of the proposed algorithm, where all the training and testing cases contained advanced nonexudative AMD with GA. It should be noted that both data sets were described and used in previous work.20,22 The first data set contained 51 longitudinal SD-OCT cube scans from 12 eyes of 8 patients with a size of 512 × 128 × 1024 corresponding to a 6 × 6 × 2-mm3 volume in the horizontal, vertical, and axial directions, respectively. Two independent experts manually drew the outlines of GA based on the B-scan images in two repeated separate sessions, which were used to generate the segmentation ground truths. Figure 1a shows one example study case with the manual segmentations by two different experts and at two different sessions and the average ground truth, which are all outlined on the full projection image. The red and green contours shows the manually segmentations by the first experts, and the blue and cyan contours shows the manually segmentations by the second experts. The second data set contained 54 SD-OCT cube scans from 54 eyes of 54 patients with a size of 200 × 200 × 1024 corresponding to the same volume in the horizontal, vertical, and axial directions, respectively. The manual outlines were drawn based on FAF images, and then were manually registered to the corresponding location in the projection images and considered as ground truth segmentations. Figure 1b shows the registration ground truth outlined on the full projection image. All the data processing and methods implementation were carried out with Matlab 2016a software (The MathWorks, Inc., Natick, MA). The research was approved by an institutional human subjects committee and followed the tenets of the Declaration of Helsinki. All federal, state, and local laws were abided by, and this study was conducted with respect to all privacy regulations. 
Figure 1
 
The example ground truths for two data sets. (a) One example study case with manual segmentations by two different experts during two different sessions, which are all outlined on the full projection image. (b) The registration ground truth outlined on the full projection image.
Figure 1
 
The example ground truths for two data sets. (a) One example study case with manual segmentations by two different experts during two different sessions, which are all outlined on the full projection image. (b) The registration ground truth outlined on the full projection image.
Processing Pipeline
As shown in Figure 2, an automatic GA segmentation method for SD-OCT images based on the deep network is proposed, which is capable of capturing the deep representations of the data while achieving high segmentation accuracy. The structure of the SA deep network was composed of five layers, including one input layer, three hidden/SA layers, and one output layer. During the training phase, the labeled A-scans with 1024 features were directly fed into the network as the input layer to obtain the deep representations. Then a soft-max classifier was trained to determine the label of each individual pixel on the projection image. Finally, a voting decision strategy was used to refine the segmentation results among 10 trained models. 
Figure 2
 
The pipeline of the proposed automatic GA segmentation method.
Figure 2
 
The pipeline of the proposed automatic GA segmentation method.
Data Preprocessing
As an interferometric method based on coherent optical beams, one of the fundamental challenges with OCT imaging is the presence of speckle noise in the tomograms.37 To reduce the influence of the noise in OCT images, we used the BM4D software (The Matlab code can be found in http://www.cs.tut.fi/∼foi/GCF-BM3D/) for volumetric data denoising,38 which is one leading denoising method for OCT. The 3D, 2D, and 1D visualization results can be found in Figure 2
Deep Network Training
For each OCT image, each pixel in the projection image is a Display Formula\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\bf{\alpha}}\)\(\def\bupbeta{\bf{\beta}}\)\(\def\bupgamma{\bf{\gamma}}\)\(\def\bupdelta{\bf{\delta}}\)\(\def\bupvarepsilon{\bf{\varepsilon}}\)\(\def\bupzeta{\bf{\zeta}}\)\(\def\bupeta{\bf{\eta}}\)\(\def\buptheta{\bf{\theta}}\)\(\def\bupiota{\bf{\iota}}\)\(\def\bupkappa{\bf{\kappa}}\)\(\def\buplambda{\bf{\lambda}}\)\(\def\bupmu{\bf{\mu}}\)\(\def\bupnu{\bf{\nu}}\)\(\def\bupxi{\bf{\xi}}\)\(\def\bupomicron{\bf{\micron}}\)\(\def\buppi{\bf{\pi}}\)\(\def\buprho{\bf{\rho}}\)\(\def\bupsigma{\bf{\sigma}}\)\(\def\buptau{\bf{\tau}}\)\(\def\bupupsilon{\bf{\upsilon}}\)\(\def\bupphi{\bf{\phi}}\)\(\def\bupchi{\bf{\chi}}\)\(\def\buppsy{\bf{\psy}}\)\(\def\bupomega{\bf{\omega}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\(D\) dimensional vector Display Formula\(x \in {{\cal R}^D}\) along the axial A-scan lines. The labeled dataset is represented asDisplay Formula\(X = \left\{ {\left( {{x_i},{y_i}} \right){\rm{|}}{x_i} \in {{\cal R}^D},{y_i} \in L,i = 1, \ldots ,N} \right\}\) where Display Formula\(N\) is the number of samples in the dataset, which is the total number of A-scans in this paper, Display Formula\({y_i}\) is the class label of the corresponding vector Display Formula\({x_i}\), and Display Formula\(L = \left\{ {{l_i}{\rm{|}}i = 1, \ldots ,N,{l_i} = 1, \ldots ,K} \right\}\) is the label set with size Display Formula\(K\). Generally, for an OCT image, the dimension of each vector Display Formula\(x\) is Display Formula\(D = 1024\). Our target was to segment the GA tissues and non-GA tissues, so the label set was Display Formula\(K = 2\). Therefore, the target of training was to learn a mapping function Display Formula\(f\left( \cdot \right):{R^{^D}} \to L\), which could map the input feature vector from the 3D space into the label space. 
An autoencoder is a neural network, which attempts to replicate its input at its output. As mentioned above, we stacked three sparse autoencoders39 as the hidden layers to construct our deep model. The training process was based on the optimization of a cost function, which measured the error between the input and its reconstruction at the output. An autoencoder is composed of an encoder and a decoder. For the input Display Formula\(x \in {{\cal R}^D}\) of one autoencoder, the encoder maps the vector Display Formula\(x\) to another vector Display Formula\(z \in {{\cal R}^{{D^{\left( 1 \right)}}}}\) as Display Formula\({z^{\left( 1 \right)}} = {h^{\left( 1 \right)}}\left( {{w^{\left( 1 \right)}}x + {b^{\left( 1 \right)}}} \right)\), where the superscript (1) indicates the first layer. Display Formula\({h^{\left( 1 \right)}}:{{\cal R}^{{D^{\left( 1 \right)}}}} \to {{\cal R}^{{D^{\left( 1 \right)}}}}\) is a transfer function for the encoder, Display Formula\({w^{\left( 1 \right)}} \in {{\cal R}^{{D^{\left( 1 \right)}} \times D}}\) is a weight matrix, and Display Formula\({b^{\left( 1 \right)}} \in {{\cal R}^{{D^{\left( 1 \right)}}}}\) is a bias vector. Then the decoder maps the encoded representation Display Formula\(z\) back into an estimate of the original input vectorDisplay Formula\(x\) as Display Formula\({\hat{x}} = {h^{\left( 2 \right)}}\left( {{w^{\left( 2 \right)}}{z^{\left( 1 \right)}} + {b^{\left( 2 \right)}}} \right)\), where the superscript (2) represents the second layer. Display Formula\({h^{\left( 2 \right)}}:{{\cal R}^D} \to {{\cal R}^D}\) is the transfer function for the decoder, Display Formula\({w^{\left( 2 \right)}} \in {{\cal R}^{D \times {D^{\left( 1 \right)}}}}\) is a weight matrix, and Display Formula\({b^{\left( 2 \right)}} \in {{\cal R}^D}\) is a bias vector. 
The cost function for training a sparse autoencoder is an adjusted mean squared error function as follows:  
\begin{equation}\tag{1}E = {1 \over N}\mathop \sum \limits_{i = 1}^N \mathop \sum \limits_{k = 1}^K {\left( {{x_{ik}} - {{\hat {x}}_{ik}}} \right)^2} + \lambda \times {{\rm{\Omega }}_{weights}} + \beta \times {{\rm{\Omega }}_{sparsity}}\end{equation}
 
Display Formula\({{\rm{\Omega }}_{weights}}\) is the Display Formula\({L_2}\) regularization term with the coefficient Display Formula\(\lambda \), which can be defined as:  
\begin{equation}\tag{2}{{\rm{\Omega }}_{weights}} = {1 \over 2}\mathop \sum \limits_{m = 1}^2 \mathop \sum \limits_{i = 1}^N \mathop \sum \limits_{k = 1}^K {\left( {w_{ik}^{\left( m \right)}} \right)^2}\end{equation}
 
Display Formula\({{\rm{\Omega }}_{sparsity}}\) is the sparsity regularization term with the coefficient Display Formula\(\beta \), which can be defined as:  
\begin{equation}\tag{3}\matrix{ {{{\rm{\Omega }}_{sparsity}} = \mathop \sum \limits_{i = 1}^{{D^{\left( 1 \right)}}} KL\left( {\rho || {{\hat{\rho}}_i}} \right)}\\{\matrix{ {\rm{where}}&{{{\hat{ \rho}}_i} = {1 \over N}\mathop \sum \limits_{j = 1}^N h\left( {w_i^{\left( 1 \right)T}{x_j} + b_j^{\left( 1 \right)}} \right)} \cr } } \cr } \end{equation}
 
Sparsity regularization term attempts to enforce a constraint on the sparsity of the output from the hidden layer, which is constructed based on the Kullback-Leibler divergence. 
For each hidden layer of the stacked autoencoders, the training target is to obtain the optimal parameter Display Formula\(\left\{ {{W^{\rm{*}}},{b^{\rm{*}}}} \right\}\) by minimizing the target cost function defined in Equation 1. The layers of stacked autoencoders are learned sequentially from top to bottom. As one of the most popular optimization methods, stochastic gradient descent is used for training the stacked autoencoders, and more details can be found in Ref. 40
The learning of the stacked autoencoders is unsupervised learning. Lastly, behind the last autoencoder layer, we stacked another supervised classifier layer, which took the output of the last autoencoder layer as the input, and outputs classification results. By stacking this supervised layer, the deep network in this paper could be treated as a multilayer perceptron, where the parameters involved in autoencoders are learned by an unsupervised phase and further fine tuned by the backpropagation.41 Table 1 summarizes the parameter settings of the autoencoder structure and autoencoder training for all the experiments in this paper. It should be noted that the coefficients Display Formula\(\lambda \) and Display Formula\(\beta \) for Display Formula\({L_2}\) regularization term and sparsity regularization term are manually set based on the experimental results. 
Table 1
 
The Parameter Settings in the Proposed Model
Table 1
 
The Parameter Settings in the Proposed Model
The representations learned by stacked autoencoders can decrease the redundant information of the input data, and preserve more useful information for the final classification. From the outputs of each layer of stacked autoencoders shown in Figure 2, a trend of sparsity can be clearly observed with the data propagation from the top layer to the bottom layer of the network. 
Voting Strategy
As we mentioned before, during the training phase, the labeled A-scans with 1024 features were directly fed into the network as the input layer to obtain the deep representations, which meant that the spatial consistency among A-scans were not taken into account. Moreover, due to the retinal structure and the characterizes OCT imaging, the corresponding OCT data (3D), B-scan images of the cross section (2D), and the A-scan samples (1D) contain various structural difference as shown in Figure 3. Figure 3a shows a full projection image of one study case with GA, where the ground truth is overlaid with the red line. Based on Figure 3a, three B-scan images of the cross section heighted with blue line are selected, and the corresponding images are shown in Figure 3b, where the GA lesions are overlaid with the blue regions. Then, for each B-scan image, two GA samples and two normal (non-GA) samples are selected highlighted with red and green lines, respectively. The intensity profiles of the selected samples are shown in Figure 3c. From Figure 3a, we can find that the full projection image contains obvious intensity inhomogeneity. Moreover, the contrast between GA lesion and background is very low. Figure 3b shows various structural difference among the selected B-scan images of the cross section. The corresponding intensity profiles of the selected A-scans further demonstrate that the structure for GA and non-GA samples had high variability, which meant that it was very difficult for the corresponding deep learning model to capture the uniform or general structural information among these samples. Therefore, in our experiment, we found that it was very difficult to get an accurate classification result by only using one deep network. 
Figure 3
 
On example to show the various structural difference in OCT data. (a) A full projection image of one study case with GA, where the ground truth is overlaid with the red line. (b) Three B-scan images of the cross section selected from (a) heighted with blue line, where the GA lesions are overlaid with the blue regions. (c) The intensity profiles of the selected A-scans, where A-scans with GA and normal A-scans (A-scans without GA) are highlighted with red and green lines, respectively.
Figure 3
 
On example to show the various structural difference in OCT data. (a) A full projection image of one study case with GA, where the ground truth is overlaid with the red line. (b) Three B-scan images of the cross section selected from (a) heighted with blue line, where the GA lesions are overlaid with the blue regions. (c) The intensity profiles of the selected A-scans, where A-scans with GA and normal A-scans (A-scans without GA) are highlighted with red and green lines, respectively.
To deal with the above observation, in this paper, we trained 10 deep network models, and a voting decision strategy was used to refine the segmentation results among 10 trained models. Specifically, we randomly selected ten thousand A-scans with GA as positive samples and 10,000 normal A-scans without GA as negative samples to train one model, and there was no intersection among the training data used in each model. Then we classified the 3D OCT data case with these 10 models and obtained ten classification results. Finally, the segmentation results were obtained with the voting decision strategy by setting the labels for each pixel as the voting probability greater than 70%. Finally, a 7 × 7 median filtering was operated on the final voting results to ensure the smoothness of the final segmentations. 
Figure 4 shows the voting decision strategy where the testing case is the same with that in Figure 3. From this figure, we can observe that each classification result obtained by these 10 models contains misclassifications due to the impact of the defects and various structural difference involved in OCT images. The voted classification result demonstrates that the proposed model can produce an accurate segmentation result, which is highly consistent with the ground truth. 
Figure 4
 
The voting decision strategy.
Figure 4
 
The voting decision strategy.
Evaluation Criterions and Comparison Methods
In this paper, we used three criteria to quantitatively evaluate the performances of each comparison method: overlap ratio (OR), absolute area difference (AAD), and correlation coefficient (CC). 
The overlap ratio is defined as the percentage of area in which both segmentation methods agree with respect to the presence of GA over the total area in which at least one of the methods detects GA (Jaccard index):  
\begin{equation}\tag{4}OR\left( {X,Y} \right) = {{Area(X\mathop \cap Y)} \over {Area(X\mathop \cup Y)}}\end{equation}
 
where Display Formula\(X\) and Display Formula\(Y\) indicate the regions inside the segmented GA contour produced by two different methods (or graders), respectively. The operators Display Formula\(\cap\) and Display Formula\(\cup\) indicate union and intersection, respectively. The mean OR and standard deviation values are computed across scans in the data sets. 
The absolute area difference measures the absolute difference between the GA areas as segmented by two different methods:  
\begin{equation}\tag{5}AAD\left( {X,Y} \right) = \left| {Area\left( X \right) - Area\left( Y \right)} \right|\end{equation}
 
Similar with OR, X, and Y indicate the regions inside the segmented GA contour produced by two different methods (or graders), respectively. The mean AAD and standard deviation values are computed across scans in the data sets. 
The CC were computed using Pearson's linear correlation between the measured areas of GA computed by the segmentation of different methods or readers, measuring the linear dependence using each scan as an observation. 
In the comparison experiments, we mainly compared with two related methods, called the Chen et al. method20 and the Niu et al. method,22 respectively, the Chen et al. method is a semisupervised method based on the geometric active contours, while the Niu et al. method is an unsupervised method based on Chan-Vese model. It should be noted that both methods relied on the retinal layer segmentation results. They needed extract the RPE layers first, and then constructed the projection images based on the pixels below RPE layers. Finally, they performed their methods on 2D-projected images. Comparatively, our proposed algorithm directly processed 3D samples without using any retinal layer segmentation results. 
Results
Testing I: Segmentation Results on the Dataset With a Size of 512 × 128 × 1024
In the first experiment, we tested the proposed model on Dataset 1, which contained 51 longitudinal SD-OCT cube scans from 12 eyes of 8 patients with a size of 512 × 128 × 1024. In the training phase, we randomly selected 10,000 A-scans with GA as positive samples and 10,000 normal A-scans without GA as negative samples to train one model, and there was no intersection among the training data used in each model. In the testing phase, we directly fed the testing 3D case into the proposed model to get the final segmentation result. 
In Figure 5, eight example cases selected from eight patients were used to show the performance of the proposed model, where the red contours show the average ground truths and the blue contours are the segmentation results. In each figure, the ground truths and the segmentation results are overlaid on the full projection images, where the red line is the outline of average ground truth, and the blue line shows the outline of the segmentation results obtained by the proposed model. From this figure, we found that the full projection images contained obvious intensity inhomogeneity and low contrast between GA lesions and normal regions. Using the deep network and voting strategy, the proposed model can produce smooth and accurate segmentation results, which are highly consistent with the average ground truths. 
Figure 5
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 1, where the average ground truths are overlaid with a red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 5
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 1, where the average ground truths are overlaid with a red line, and the segmentations obtained with the proposed model are overlaid with blue line.
The quantitative results in interobserver and intraobserver agreement evaluation for Dataset 1 are summarized in Table 2, where Display Formula\({A_i}\left( {i = 1,2} \right)\) represents the segmentations of the first grader in the i-th session, and Display Formula\({B_i}\left( {i = 1,2} \right)\) represents the segmentations of the second grader in the i-th session. Interobserver differences were computed by considering the union of both sessions for each grader: Display Formula\({A_{1\& 2}}\) and Display Formula\({B_{1\& 2}}\) represent the first and second grader, respectively. The intraobserver and interobserver comparison showed very high CC, indicating very high linear correlation and between different readers and for the same reader at different sessions. The overlap ratios (all >90%) and the absolute GA area differences (all <5%) indicate very high interobserver and intraobserver agreement, highlighting that the measurement and quantification of GA regions in the generated projection images seem effective and feasible.20,22 
Table 2
 
Intraobserver and Interobserver CC, AAD and OR Evaluations20,22
Table 2
 
Intraobserver and Interobserver CC, AAD and OR Evaluations20,22
Then we qualitatively compared the outlines of the segmentations obtained by the proposed model and two comparison methods on six examples in Figure 6. In each figure, the white line shows the average ground truth. The green, blue, and red lines show the Chen et al.,20 Niu et al.,22 and our segmentations, respectively. For the second and the fifth cases, all the comparison methods could produce satisfactory results because the structure of GA is obvious and the corresponding contrast is higher. For the first and the sixth cases, due to the impact of the low contrast, both the Chen et al.20 and Niu et al.22 methods failed to detect parts of the boundaries between GA lesions and non-GA regions. The Niu et al.22 method misclassified normal regions as GA lesions for the third and sixth cases, while the Chen et al.20 method misclassified GA lesions as normal regions for the fourth case. Moreover, in the fourth case, the upper left corner of the GA lesions were all missed by the Chen et al.20 and Niu et al.22 methods. Comparatively, the proposed model qualitatively outperformed the other two methods even without using any retinal layer segmentation results, and obtained higher consistency with the average ground truths. 
Figure 6
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 1, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, Chen et al.'s20 method and Niu et al.'s22 method are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 6
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 1, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, Chen et al.'s20 method and Niu et al.'s22 method are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 7 shows the quantitative comparison results between our segmentation results and manual gold standards (average expert segmentations) on all the cases in Dataset 1, where the top figure shows the OR comparison, the middle figure shows the AAD comparison measured with volume, and the bottom figure shows the AAD comparison results measured with percentage. In each subfigure, the green rhombus, blue squares, and red circles, respectively, indicate the segmentation accuracy of the Chen et al.,20 Niu et al.,22 and our methods. From this figure, we can quantitatively observe that the proposed model can produce more accurate segmentation results in most cases. Table 3 summarizes the average quantitative results between the segmentation results and manual gold standards (individual reader segmentations and the average expert segmentations) on Dataset 1. Overall, our model is capable of producing a higher segmentation accuracy to the manual gold standard than both the Chen et al.20 and Niu et al.22 methods by presenting higher CC (0.986 vs. 0.970 and 0.979), lower AAD (11.49% vs. 27.17% and 12.95%), and higher OR (86.94% vs. 72.6% and 81.86%). A higher CC indicates that our model resulted in more similar results to the ground truth. Lower AAD indicates the areas estimated by the proposed model are closer to those manual productions. Higher OR indicates the proposed model can obtain more similar results to the manual outlines. Moreover, the proposed model is more robust to all the cases in Dataset 1 due to the lower standard deviations. In conclusion, the proposed algorithm showed better segmentation performances than the other two comparison methods on Dataset 1. 
Figure 7
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 1.
Figure 7
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 1.
Table 3
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1
Table 3
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1
Testing II: Segmentation Results on the Dataset With a Size of 200 × 200 × 1024
In the second experiment, we tested the proposed model on Dataset 2, which contains 54 SD-OCT cube scans from 54 eyes of 54 patients with a size of 200 × 200 × 1024. Similar to the first experiment, in the training phase, we randomly selected 10,000 A-scans with GA as positive samples and 10,000 normal A-scans without GA as negative samples to train one model, and there was no intersection among the training data used in each model. In the testing phase, we directly fed the testing 3D case into the proposed model to obtain the final segmentation results. 
In Figure 8, eight example cases selected from eight patients were used to show the performance of the proposed model, where the red and blue contours show the average ground truths and segmentation results, respectively. In each figure, the ground truth and the segmentation results are overlaid on the full projection images with the red and blue lines, respectively. We obtained similar results in that the proposed model could produce accurate results highly consistent with the average ground truths. 
Figure 8
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 2, where the average ground truths are overlaid with red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 8
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 2, where the average ground truths are overlaid with red line, and the segmentations obtained with the proposed model are overlaid with blue line.
We qualitatively compared the segmentations obtained by the proposed model and two comparison methods on six examples in Figure 9. In each figure, the average ground truths were overlaid with the white lines, and the segmentations obtained with the Chen et al.20, Niu et al.,22 and the proposed methods were overlaid with the green, blue, and red lines, respectively. In the first and fifth cases, all the comparison methods could produce satisfactory results due to the higher contrast of GA lesions. In the second and the sixth examples, the Chen et al.20 and Niu et al.22 methods obtained grossly misclassified regions. The Niu et al.22 method failed to segment the third case, while the Chen et al.20 method failed to segment the fourth case. Moreover, in the last example, the region inside the GA lesions were all misclassified by both the Chen et al.20 and Niu et al.22 methods. Comparatively, without using any retinal layer segmentation results, our proposed model qualitatively outperformed the other two methods, and obtained results more consistent with the average ground truths. 
Figure 9
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 2, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, the Chen et al.20 and Niu et al.22 methods are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 9
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 2, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, the Chen et al.20 and Niu et al.22 methods are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 10 shows the quantitative comparison results between the segmentation results and average expert segmentations on all the cases in Dataset 2, where the figures from top to bottom show OR comparison, AAD comparison measured with volume, and AAD comparison measured with percentage. In each subfigure, the green rhombus, blue squares, and red circles, respectively, indicate the segmentation accuracy of the Chen et al.,20 Niu et al.,22 and our methods. Table 4 summarizes the average quantitative results between the segmentation results and manual gold standards on Dataset 2. Overall, our model was capable of producing higher segmentation accuracies to the manual gold standards than both the Chen et al.20 and Niu et al.22 methods by presenting higher CC (0.995 vs. 0.937 and 0.955), lower AAD (8.30% vs. 19.68% and 22.96%), and higher OR (81.66% vs. 65.88% and 70.00%). Moreover, the proposed model was more robust to all the cases in Dataset 2 due to the lower standard deviations. In conclusion, the proposed algorithm showed better segmentation performance than the other two comparison methods on Dataset 2. 
Figure 10
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 2.
Figure 10
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 2.
Table 4
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2
Table 4
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2
Testing III: Segmentation Results With Patient-Independent Testing
In Testing I and Testing II, A-scans used for training came from the same patients that were later tested on, which means that these two experiments were not independent on the patient level. To further verify the performance of the propose model on patient-independent testing, in this experiment, we respectively divided Dataset 1 and Dataset 2 into two disjoint parts on the patient level. Specifically, for Dataset 1, 51 cases from eight patients were divided into two parts: the first part contains 25 images from four patients and the second part contains the other 26 images from the other four patients. For Dataset 2, 54 eyes from 54 patients were also divided into two disjoint parts, and each part contains 27 images from 27 patients without any overlap. In the training phase, we randomly selected 10,000 A-scans with GA as positive samples and 10,000 normal A-scans without GA as negative samples from one part to train the models. In the testing phase, we directly fed the testing 3D cases in the other part into the proposed model to get the final segmentation result. Therefore, the training and testing sets are totally independent with each other on the patient level. Table 5 summarizes the average quantitative results between the segmentation results obtained with patient-independent testing procedure and manual gold standards on both data sets. For the first dataset, our algorithm on patient-independent testing procedure obtains a total mean OR 83.45% ± 9.56%, AAD 14.49% ± 14.30%, and CC 0.975. For the second dataset, the mean OR, AAD, and CC of the proposed method patient-independent testing procedure are 78.00% ± 12.86%, 11.86% ± 12.09%, and 0.992, respectively. The performances of patient-independent testing procedure declined approximately 4% compared with the quantitative results of the proposed model under patient-dependent procedure (Testing I and Testing II). However, performances under the patient-independent procedure still outperform the Chen et al.20 and Niu et al.22 methods. 
Table 5
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets for Patient-Independent Testing
Table 5
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets for Patient-Independent Testing
Testing IV: Segmentation Results With Cross Testing
In the last experiment, we executed a cross-testing procedure by using the trained models on one dataset to test the cases in the other dataset. Specificity, we tested all the cases in Dataset 1 with the models trained by Dataset 2, and we tested all the cases in Dataset 2 with the models trained by Dataset 1. It should be noted that we did not retrain the models, instead we directly used the models trained in the first and second experiments. 
Figure 11 shows the cross testing results comparing with the original segmentations of the proposed model and the ground truths. The figures in the first line shows the cross segmentations on four cases selected from Dataset 1, while the figures in the first line shows the cross segmentations on four cases selected from Dataset 2. In each figure, the average ground truths, the segmentations obtained with the proposed model, and the cross segmentation results were overlaid with red, green, and blue lines, respectively. From Figure 11, we can observe that the proposed model can still produce satisfactory results with cross testing procedure. However, compared with the original segmentations of the proposed model, the results obtained with cross testing contain misclassifications, especially for the regions near boundaries. 
Figure 11
 
Segmentation results overlaid on full projection images for four cases selected Dataset 1 (top row) and four cases selected Dataset 2 (bottom row), where the average ground truths, the segmentations obtained with the proposed model, and the cross segmentation results are overlaid with red, green, and blue lines, respectively.
Figure 11
 
Segmentation results overlaid on full projection images for four cases selected Dataset 1 (top row) and four cases selected Dataset 2 (bottom row), where the average ground truths, the segmentations obtained with the proposed model, and the cross segmentation results are overlaid with red, green, and blue lines, respectively.
Table 6 summarizes the average quantitative results between the segmentation results obtained with cross-testing procedure and manual gold standards on both data sets. The performances of cross-testing procedure declined sharply (∼10%) compared with the original results of the proposed model. However, the cross-testing procedure still outperform the Chen et al.20 method, and can produce similar accuracy comparing with the Niu et al.22 method. 
Table 6
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets
Table 6
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets
Discussion
In this paper, based on the deep neural networks, we proposed an automatic and accurate GA segmentation method for SD-OCT images without using any retinal layer segmentation results. This is the first method that segments GA lesions from OCT images using deep learning technologies. 
As listed in Table 3, our model produced higher segmentation accuracies to the manual gold standards generated by two different readers and repeated at two separated sessions than both the Chen et al.20 and Niu et al.22 methods. Our model had a higher CC (0.986), lower AAD (11.49%), and higher OR (86.94%). The proposed method also improved segmentation accuracy over 5% when compared with related algorithms on Dataset 1. 
As summarized in Table 4, the proposed model also obtained higher segmentation accuracies when compared with the registered ground truths manually drawn in FAF images (CC: 0.995, AAD: 8.30%, OR: 81.66%), and improve segmentation accuracy by over 10% when comparing with related algorithms on Dataset 2. The example segmentations shown in Figures 5 and 8 corroborate the highly consistent with the average ground truths, and the comparison results shown in Figures 6, 7, 9, and 10 further demonstrate the superior performances comparing with the related methods. 
Compared with the results summarized in Table 5 and the results of the proposed model listed in Tables 3 and 4, the segmentation accuracy decreased approximately 10% on both data sets. The main reasons were: (1) the ground truths were inherently different. As shown in Figure 1, the ground truths of two data sets were obtained through different procedures. The ground truths were drawn based on the OCT data itself for Dataset 1, while the ground truths were registered based on drawn outlines in FAF images for Dataset 2. Therefore, the performance of cross testing was better in Dataset 1 than those in Dataset 2. (2) The structure of the data varied. As shown in Figure 3, we found that even within one dataset, the intensity profiles of A-scans varied greatly, which meant that it was very difficult for the corresponding deep learning model to capture general structural information or general features among these A-scans. When we executed the cross testing, this difficulty was further magnified and the performance of the cross testing procedure declined sharply (∼10%) compared with the original results of the proposed model. Ultimately though, the cross-testing procedure still outperformed the Chen et al.20 method, and produced similar accuracy when compared with the Niu et al.22 method. 
Consequently, without retinal layer segmentation, the proposed algorithm was able to obtain a higher segmentation accuracy when compared with the state-of-the-art methods relying on the retinal layer segmentations. Our methods may provide reliable GA segmentations for SD-OCT images and be useful for clinical diagnosis. 
In this paper, a deep voting model is proposed to segment GA in SD-OCT images, which contains two keywords (i.e., deep and voting). To further test the efficiency of the proposed deep voting model, we implemented two other models: the first model was a shallow voting model in which the voting strategy was applied on a shallow neural network with a single hidden layer. Therefore, in shallow voting model, the structure of the SA neural network was composed of three layers, including one input layer, one hidden/SA layer, and one output layer. Finally, a voting decision strategy was used to refine the segmentation results among 10 trained models. The second model was called one deep model by training a single deep model with 100% of the training set data without using the voting strategy. It should be noted that the structure of the SA deep network is the same with that in deep voting model. 
Then, we tested the above two models (i.e., one deep model and shallow voting model, on Dataset 1 and Dataset 2). For the shallow voting model, we randomly selected 10,000 A-scans with GA as positive samples and 10,000 normal A-scans without GA as negative samples to train one model, and there was no intersection among the training data used in each model. For the one deep model, we used 105 A-scans with GA as positive samples and 105 normal A-scans without GA as negative samples to train the model. The testing phase is same for all the models. The quantitative results in interobserver and intraobserver agreement evaluation for Dataset 1 are summarized in Table 7. Table 8 summarizes the average quantitative results between the segmentation results and manual gold standards on Dataset 2. It should be noted that the results of the proposed deep voting model were listed in Tables 3 and 4, respectively. 
Table 7
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1 by Applying One Deep Model and Shallow Voting Model
Table 7
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1 by Applying One Deep Model and Shallow Voting Model
Table 8
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2 by Applying One Deep Model and Shallow Voting Model
Table 8
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2 by Applying One Deep Model and Shallow Voting Model
From Tables 7 and 8 we can observe that the shallow voting model outperforms the deep model without using the voting strategy, which means that the voting strategy is an efficient way to further improve the performance of the model. Comparing the results obtained with the shallow voting model and the proposed deep voting model, we can find that the representations learned by stacked autoencoders can decrease the redundant information of the input data, and preserve more useful information for the final classification. 
In our experimental results, we quantitatively evaluated the proposed model over all the OCT cases without considering any patient group or eye group. To further demonstrate the robustness of proposed model on different patients and different eyes, Table 9 lists the grouping quantitative results based on the patient-dependent procedure (Testing I and Testing II) and patient-independent procedure (Testing III). For Dataset 1, 51 SD-OCT cubes from 12 eyes of 8 patients were firstly grouped based on the patients as 8 groups (from P1–P8), and then grouped based on eyes as twos groups where right eye group is oculus dexter (OD) and left eye group is oculus sinister (OS). For Dataset 2, 54 SD-OCT cube scans from 54 eyes of 54 patients were grouped based on eyes as two groups (i.e., OD group and OS group). From Table 9 we can observe that the proposed model is robust to the patient group and eye group. 
Table 9
 
The Quantitative Results of the Proposed Model Based on the Patient Group and Eye Group
Table 9
 
The Quantitative Results of the Proposed Model Based on the Patient Group and Eye Group
Table 9
 
Extended
Table 9
 
Extended
Our proposed model moves past the limitations that retinal layer segmentation present, thus making it more practical in real-life applications. Because GA is generally associated with retinal thinning and loss of RPE and photoreceptors, state-of-the-art algorithms mainly segment the GA regions based on the projection image generated with the voxels between the RPE and the choroid layers, which means the corresponding methods rely on the accuracy of retinal layer segmentations. Comparatively, the data samples (A-scans) with 1024 features, without using any layer segmentation results, are directly fed into the network during the training and testing phases. 
Our method also does not rely on large data sets. In the training phase for the Dataset 1 and Dataset 2, we only, respectively, needed approximately 5% and 9% of the total data to train our model. The proposed algorithm also showed data transfer capability, which was demonstrated with the third experiment. Even though the corresponding segmentation accuracy obtained by the cross testing procedure decreased approximately 10% on both data sets when compared with the original proposed model, the cross-testing procedure could still produce satisfactory results. 
There are also some limitations about the proposed algorithm, which are summarized as follows: 
(1) an interferometric method based on coherent optical beams, one of the fundamental challenges with OCT imaging is the presence of speckle noise in the tomograms. However, in the proposed model, the 1D data samples (A-scans) were directly fed into the network as the input layer, which meant that the spatial consistency among samples were not taken into account. Therefore, the proposed deep voting model was sensitive to the noise, and data preprocessing for image denoising was necessary. Our future work will focus on how to take the spatial consistency among samples into account in the deep learning models, for example, using convolutional neural networks or recurrent neural networks. (2) The voting strategy used in this paper is heuristic and intuitive, which treats each result obtained by ten models as equally important. In the future, we plan to automatically detect the importance for each model. (3) How deep should the deep network be? The deep network used in this paper is actually not very deep. In our experiments, we tried to add more hidden layers to further improve the performance. Unfortunately, in GA segmentation, we found that the accuracy improvement with more hidden layers was very limited and only served to increase the training cost. This is mainly due to the varying structural differences in OCT data. As shown in Figure 3, the intensity profiles of the selected samples demonstrate that the structure for A-scans with GA and normal A-scans without GA vary greatly, and it is very difficult for the corresponding deep learning model to capture the uniform or general structural information from these A-scans. In the future, we plan to detect the foveal center in the OCT data, which would further reduce the structural variance among different OCT scans and improve the performances of deep neural networks. (4) Instead of using the current widely used networks, like AlexNet and GoogleNet, in this paper, we used the sparse autoencoder to construct our deep model to discover and represent the sparsity of OCT data. From Figure 2, a trend of sparsity can be clearly observed with the data propagation from the top layer to the bottom layer of the network, which further indicates the efficiency of the proposed model. How to segment GA with pretrained deep network is out of the scope of this paper and subjects to future research. 
Acknowledgments
Supported by the National Natural Science Foundation of China under Grants No. 61401209, 61671242, 61701192, and 61672291, the Natural Science Foundation of Jiangsu Province, China (Youth Fund Project) under Grant No. BK20140790, the Natural Science Foundation of Shandong Province, China (Youth Fund Project) under Grant No. ZR2017QF004, the Fundamental Research Funds for the Central Universities under Grants No. 30916011324 and 30920140111004, China Postdoctoral Science Foundation under Grants No. 2014T70525, and 2013M531364 and 2017M612178, and Research to Prevent Blindness. 
Disclosure: Z. Ji, None; Q. Cheng, None; S. Niu, None; T. Leng, None; D.L. Rubin, None 
References
Klein R, Klein BE, Knudtson MD, Meuer SM, Swift M, Gangnon RE. Fifteen-year cumulative incidence of age-related macular degeneration: the Beaver Dam Eye Study. Ophthalmology. 2007; 114: 253–262.
Sunness JS. The natural history of geographic atrophy, the advanced atrophic form of age-related macular degeneration. Mol Vis. 1999; 5: 25.
Ying GS, Kim BJ, Maguire MG, et al. Sustained visual acuity loss in the comparison of age-related macular degeneration treatments trials. JAMA Ophthalmol. 2014; 32: 915–921.
Nunes RP, Gregori G, Yehoshua Z, et al. Predicting the progression of geographic atrophy in age-related macular degeneration with SD-OCT en face imaging of the outer retina. Ophthalmic Surg Lasers Imaging Retina. 2013; 44: 344–359.
Tolentino MJ, Dennrick A, John E, Tolentino MS. Drugs in phase ii clinical trials for the treatment of age-related macular degeneration. Expert Opin Invest Drugs. 2015; 242: 183–199.
Chaikitmongkol V, Tadarati M, Bressler N M. Recent approaches to evaluating and monitoring geographic atrophy. Curr Opin Ophthalmol. 2016; 27: 217–223.
Schmitz-Valckenberg S, Sadda S, Staurenghi G, Chew EY, Fleckenstein M, Holz FG. Geographic atrophy: semantic considerations and literature review. Retina. 2016; 36: 2250–2264.
Abramoff M, Garvin M, Sonka M. Retinal imaging and image analysis. IEEE Rev Biomed Eng. 2010; 3: 169–208.
Feeny AK, Tadarati M, Freund DE, Bressler NM, Burlina P. Automated segmentation of geographic atrophy of the retinal epithelium via random forests in AREDS color fundus images. Comput Biol Med. 2015; 65: 124–136.
Panthier C, Querques G, Puche N, et al. Evaluation of semiautomated measurement of geographic atrophy in age-related macular degeneration by fundus autofluorescence in clinical setting. Retina. 2014; 343: 576–582.
Lee N, Laine AF, Barbazetto I, Busuoic M, Smith R. Level set segmentation of geographic atrophy in macular autofluorescence images. Invest Ophthalmol Vis Sci. 2006; 47: 2125.
Lee N, Smith RT, Laine AF. Interactive segmentation for geographic atrophy in retinal fundus images. Conf Rec Asilomar Conf Signals Syst Comput. 2008; 42: 655–658.
Deckert A, Schmitz-Valckenberg S, Jorzik J, Bindewald A, Holz FG, Mansmann U. Automated analysis of digital fundus autofluorescence images of geographic atrophy in advanced age-related macular degeneration using confocal scanning laser ophthalmoscopy (cSLO). BMC Ophthalmol. 2005; 5: 8.
Hu Z, Medioni GG, Hernandez M, Sadda SR. Automated segmentation of geographic atrophy in fundus autofluorescence images using supervised pixel classification. J Med Imaging. 2015; 2: 014501.
Ramsey DJ, Sunness JS, Malviya P, Applegate C, Hager GD, Handa JT. Automated image alignment and segmentation to follow progression of geographic atrophy in age-related macular degeneration. Retina. 2014; 34: 1296–1307.
Zohar Y, Rosenfeld PJ, Gregori G, Feuer WJ. Progression of geographic atrophy in age-related macular degeneration imaged with spectral-domain optical coherence tomography. Ophthalmology. 2011; 118: 679–686.
De Niro JE, McDonald HR, Johnson RN. Sensitivity of fluid detection in patients with neovascular AMD using spectral domain optical coherence tomography high-definition line scans. Retina. 2014; 34: 1163–1166.
Shuang L, Wang B, Yin B. Retinal nerve fiber layer reflectance for early glaucoma diagnosis. J Glaucoma. 2014; 23: e45–e52.
Folgar FA, Yuan EL, Sevilla MB, et al.; for the Age Related Eye Disease Study 2 Ancillary Spectral-Domain Optical Coherence Tomography Study Group. Drusen volume and retinal pigment epithelium abnormal thinning volume predict 2-year progression of age-related macular degeneration. Ophthalmology. 2016; 123: 39–50.
Chen Q, de Sisternes L, Leng T, Zheng L, Kutzscher L, Rubin DL. Semi-automatic geographic atrophy segmentation for SD-OCT images. Biomed Opt Express. 2013; 4: 2729–2750.
Hu Z, Medioni GG, Hernandez M, Hariri A, Wu X, Sadda SR. Segmentation of the geographic atrophy in spectral-domain optical coherence tomography and fundus autofluorescence images. Invest Ophthalmol Vis Sci. 2013; 54: 8375–8383.
Niu S, de Sisternes L, Chen Q, Leng T, Rubin DL. Automated geographic atrophy segmentation for SD-OCT images using region-based CV model via local similarity factor. Biomed Opt Express. 2016; 7: 581–600.
Niu S, de Sisternes L, Chen Q, Rubin DL, Leng T. Fully automated prediction of geographic atrophy growth using quantitative spectral-domain optical coherence tomography biomarkers. Ophthalmology. 2016; 123: 1737–1750.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521: 436–444.
Shen DG, Wu GR, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Engin. 2017; 19: 221–248.
Kleesiek J, Urban G, Hubert A, et al. Deep MRI brain extraction: a 3D convolutional neural network for skull stripping. NeuroImage. 2016; 129: 460–469.
Wu G, Kim M, Wang Q, Munsell BC, Shen D. Scalable high-performance image registration framework by unsupervised deep feature representations learning. IEEE Trans Biomed Engin. 2016; 63: 1505–1516.
Suk HI, Lee SW, Shen DG. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage. 2014; 101: 569–582.
Suk HI, Shen D. Deep learning in diagnosis of brain disorders. In: Lee SW, Bulthoff HH, Muller KR, eds. Recent Progress in Brain and Cognitive Engineering. Dorecht, the Netherlands: Springer; 2015: 203–213.
Dou Q, Chen H, Yu L, et al. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE Trans Med Imaging. 2016; 35: 1182–1195.
Abràmoff MD, Lou Y, Erginay A, Clarida W, Amelon R, Folk JC, Niemeijer M. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning deep learning detection of diabetic retinopathy. Invest Ophthalmol Vis Sci. 2016; 57: 5200–5206.
Asaoka R, Murata H, Iwase A, Araie M. Detecting preperimetric glaucoma with standard automated perimetry using a deep learning classifier. Ophthalmology. 2016; 123: 1974–1980.
Gao X, Lin S, Wong TY. Automatic feature learning to grade nuclear cataracts based on deep learning. IEEE Trans Biomed Eng. 2015; 62: 2693–2701.
Prentašic P, Heisler M, Mammo Z, et al. Segmentation of the foveal microvasculature using deep learning networks. J Biomed Opt. 2016; 21: 075008.
Lee CS, Baughman DM, Lee AY. Deep learning is effective for the classification of OCT images of normal versus age-related macular degeneration. Ophthalmology. 2017; 8: 1090–1095.
Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017; 124: 962–969.
Cameron A, Lui D, Boroomand A, Glaister J, Wong A, Bizheva K. Stochastic speckle noise compensation in optical coherence tomography using non-stationary spline-based speckle noise modelling. Biomed Opt Express. 2013; 4: 1769–1785.
Maggioni M, Katkovnik V, Egiazarian K, Foi A. A nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans Image Process. 2013; 22: 119–133.
Ng A. Sparse autoencoder. CS294A Lecture notes. 2011; 72: 1–19.
Bottou L. Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G, eds. Proceedings of COMPSTAT'2010. Heidelberg: Physica-Verlag HD; 2010: 177–186.
Yegnanarayana B. Artificial Neural Networks. New Delhi: PHI Learning Pvt. Ltd.; 2009.
Figure 1
 
The example ground truths for two data sets. (a) One example study case with manual segmentations by two different experts during two different sessions, which are all outlined on the full projection image. (b) The registration ground truth outlined on the full projection image.
Figure 1
 
The example ground truths for two data sets. (a) One example study case with manual segmentations by two different experts during two different sessions, which are all outlined on the full projection image. (b) The registration ground truth outlined on the full projection image.
Figure 2
 
The pipeline of the proposed automatic GA segmentation method.
Figure 2
 
The pipeline of the proposed automatic GA segmentation method.
Figure 3
 
On example to show the various structural difference in OCT data. (a) A full projection image of one study case with GA, where the ground truth is overlaid with the red line. (b) Three B-scan images of the cross section selected from (a) heighted with blue line, where the GA lesions are overlaid with the blue regions. (c) The intensity profiles of the selected A-scans, where A-scans with GA and normal A-scans (A-scans without GA) are highlighted with red and green lines, respectively.
Figure 3
 
On example to show the various structural difference in OCT data. (a) A full projection image of one study case with GA, where the ground truth is overlaid with the red line. (b) Three B-scan images of the cross section selected from (a) heighted with blue line, where the GA lesions are overlaid with the blue regions. (c) The intensity profiles of the selected A-scans, where A-scans with GA and normal A-scans (A-scans without GA) are highlighted with red and green lines, respectively.
Figure 4
 
The voting decision strategy.
Figure 4
 
The voting decision strategy.
Figure 5
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 1, where the average ground truths are overlaid with a red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 5
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 1, where the average ground truths are overlaid with a red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 6
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 1, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, Chen et al.'s20 method and Niu et al.'s22 method are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 6
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 1, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, Chen et al.'s20 method and Niu et al.'s22 method are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 7
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 1.
Figure 7
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 1.
Figure 8
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 2, where the average ground truths are overlaid with red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 8
 
Segmentation results overlaid on full projection images for eight example cases selected from eight patients in Dataset 2, where the average ground truths are overlaid with red line, and the segmentations obtained with the proposed model are overlaid with blue line.
Figure 9
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 2, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, the Chen et al.20 and Niu et al.22 methods are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 9
 
Comparison of segmentation results overlaid on full projection images for six example cases selected from six patients in Dataset 2, where the average ground truths are overlaid with a white line, and the segmentations obtained with the proposed model, the Chen et al.20 and Niu et al.22 methods are overlaid with red, green, and blue lines, respectively. In each subfigure, the top image shows the segmentation results overlaid on full projection images, and the bottom image shows the enlarged view of the rectangles region marked by an orange box.
Figure 10
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 2.
Figure 10
 
The quantitative comparisons between the segmentations and average expert segmentations on all the cases in Dataset 2.
Figure 11
 
Segmentation results overlaid on full projection images for four cases selected Dataset 1 (top row) and four cases selected Dataset 2 (bottom row), where the average ground truths, the segmentations obtained with the proposed model, and the cross segmentation results are overlaid with red, green, and blue lines, respectively.
Figure 11
 
Segmentation results overlaid on full projection images for four cases selected Dataset 1 (top row) and four cases selected Dataset 2 (bottom row), where the average ground truths, the segmentations obtained with the proposed model, and the cross segmentation results are overlaid with red, green, and blue lines, respectively.
Table 1
 
The Parameter Settings in the Proposed Model
Table 1
 
The Parameter Settings in the Proposed Model
Table 2
 
Intraobserver and Interobserver CC, AAD and OR Evaluations20,22
Table 2
 
Intraobserver and Interobserver CC, AAD and OR Evaluations20,22
Table 3
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1
Table 3
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1
Table 4
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2
Table 4
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2
Table 5
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets for Patient-Independent Testing
Table 5
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets for Patient-Independent Testing
Table 6
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets
Table 6
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Two Data Sets
Table 7
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1 by Applying One Deep Model and Shallow Voting Model
Table 7
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards (Individual Reader Segmentations and the Average Expert Segmentations) on Dataset 1 by Applying One Deep Model and Shallow Voting Model
Table 8
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2 by Applying One Deep Model and Shallow Voting Model
Table 8
 
The Summarizations of the Quantitative Results (Mean ± Standard Deviation) Between the Segmentations and Manual Gold Standards on Dataset 2 by Applying One Deep Model and Shallow Voting Model
Table 9
 
The Quantitative Results of the Proposed Model Based on the Patient Group and Eye Group
Table 9
 
The Quantitative Results of the Proposed Model Based on the Patient Group and Eye Group
Table 9
 
Extended
Table 9
 
Extended
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×