Open Access
Artificial Intelligence  |   March 2023
AxoNet 2.0: A Deep Learning-Based Tool for Morphometric Analysis of Retinal Ganglion Cell Axons
Author Affiliations
  • Vidisha Goyal
    School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
  • A. Thomas Read
    Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
  • Matthew D. Ritch
    Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
  • Bailey G. Hannon
    George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, USA
  • Gabriela Sanchez Rodriguez
    Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
  • Dillon M. Brown
    Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
  • Andrew J. Feola
    Center for Visual and Neurocognitive Rehabilitation, Atlanta VA Healthcare System, Decatur, GA, USA
    Department of Ophthalmology, Emory University, Atlanta, GA, USA
  • Adam Hedberg-Buenz
    Department of Molecular Physiology and Biophysics, University of Iowa, Iowa City, IA, USA
    Iowa City VA Health Care System and Iowa City VA Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
  • Grant A. Cull
    Devers Eye Institute, Legacy Research Institute, Portland, OR, USA
  • Juan Reynaud
    Devers Eye Institute, Legacy Research Institute, Portland, OR, USA
  • Mona K. Garvin
    Devers Eye Institute, Legacy Research Institute, Portland, OR, USA
    Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, USA
  • Michael G. Anderson
    Department of Molecular Physiology and Biophysics, University of Iowa, Iowa City, IA, USA
    Iowa City VA Health Care System and Iowa City VA Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
  • Claude F. Burgoyne
    Devers Eye Institute, Legacy Research Institute, Portland, OR, USA
  • C. Ross Ethier
    Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
    George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, USA
    Department of Ophthalmology, Emory University, Atlanta, GA, USA
Translational Vision Science & Technology March 2023, Vol.12, 9. doi:https://doi.org/10.1167/tvst.12.3.9
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Vidisha Goyal, A. Thomas Read, Matthew D. Ritch, Bailey G. Hannon, Gabriela Sanchez Rodriguez, Dillon M. Brown, Andrew J. Feola, Adam Hedberg-Buenz, Grant A. Cull, Juan Reynaud, Mona K. Garvin, Michael G. Anderson, Claude F. Burgoyne, C. Ross Ethier; AxoNet 2.0: A Deep Learning-Based Tool for Morphometric Analysis of Retinal Ganglion Cell Axons. Trans. Vis. Sci. Tech. 2023;12(3):9. https://doi.org/10.1167/tvst.12.3.9.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Assessment of glaucomatous damage in animal models is facilitated by rapid and accurate quantification of retinal ganglion cell (RGC) axonal loss and morphologic change. However, manual assessment is extremely time- and labor-intensive. Here, we developed AxoNet 2.0, an automated deep learning (DL) tool that (i) counts normal-appearing RGC axons and (ii) quantifies their morphometry from light micrographs.

Methods: A DL algorithm was trained to segment the axoplasm and myelin sheath of normal-appearing axons using manually-annotated rat optic nerve (ON) cross-sectional micrographs. Performance was quantified by various metrics (e.g., soft-Dice coefficient between predicted and ground-truth segmentations). We also quantified axon counts, axon density, and axon size distributions between hypertensive and control eyes and compared to literature reports.

Results: AxoNet 2.0 performed very well when compared to manual annotations of rat ON (R2 = 0.92 for automated vs. manual counts, soft-Dice coefficient = 0.81 ± 0.02, mean absolute percentage error in axonal morphometric outcomes < 15%). AxoNet 2.0 also showed promise for generalization, performing well on other animal models (R2 = 0.97 between automated versus manual counts for mice and 0.98 for non-human primates). As expected, the algorithm detected decreased in axon density in hypertensive rat eyes (P ≪ 0.001) with preferential loss of large axons (P < 0.001).

Conclusions: AxoNet 2.0 provides a fast and nonsubjective tool to quantify both RGC axon counts and morphological features, thus assisting with assessing axonal damage in animal models of glaucomatous optic neuropathy.

Translational Relevance: This deep learning approach will increase rigor of basic science studies designed to investigate RGC axon protection and regeneration.

Introduction
Glaucoma is the leading cause of global irreversible blindness, estimated to affect nearly 80 million people worldwide.1 In this disorder, retinal ganglion cell (RGCs) progressively degenerate and undergo apoptosis. Therefore the number and appearance of RGC axons in micrographs of optic nerve (ON) cross-sections have been used to assess glaucomatous damage in animal models of the disease. 
The benchmark approach for quantifying RGC axonal loss is expert manual assessment of electron micrographs of the ON. However, this is extremely time- and labor-intensive.24 Additionally, normal axon appearance is highly variable, depending on the extent of nerve damage and on tissue and image processing methodologies, which hinders comparison of results between laboratories. 
To reduce effort in quantifying RGC axons, previous studies have used various methods (Supplementary Table S1) including semi-quantitative grading of overall tissue damage,5,6 and sub-sampling711 using both semiautomated8,12,13 and automated approaches.1419 However, these techniques suffer from several limitations (Supplementary Material S1), including the following: 
  • Sensitivity to tissue processing and imaging artifacts, a particular concern for automated techniques using manually selected parameter values.8,1215
  • Inability to accurately analyze micrographs of damaged ON cross-sections, particularly for deep learning-based algorithms that are trained with datasets that do not incorporate sufficiently damaged nerves.16,17
  • Inability to assess light micrographic montages, preferred because of cost and time constraints. This is especially true for most deep learning–based automated techniques, which have only been trained to process electron micrographs,16,17 with several recent exceptions.18,19
Here, we expand on our previous research18 to develop and train a deep learning algorithm to segment the axoplasm and myelin sheath of normal-appearing axons from light micrographs of healthy and damaged ON cross-sections from Brown-Norway rats, a widely used animal model of glaucoma. This algorithm allows us to both count and morphometrically characterize normal-appearing axons from optic nerves. 
Methods
In this study we used several cohorts of eyes and their associated optic nerve images, as follows. 
  • Cohort I consisted of 27 eyes of 14 Brown Norway rats with unilateral ocular hypertension. Randomly selected sub-images from light micrographic montages of optic nerve cross-sections were manually annotated and used to train and test the deep learning algorithm. These images were previously used by our group to train AxoNet 1.0,18 but here they were more extensively annotated to train AxoNet 2.0, as described below.
  • Cohort II consisted of 22 mice and 25 non-human primates (NHP). Randomly selected sub-images from light micrographic montages of optic nerve cross-sections that had been previously manually annotated were used to test the deep learning algorithm.
  • Cohort III consisted of both eyes of 23 Brown Norway rats with unilateral ocular hypertension. Optic nerve cross-sectional light micrographic montages were created and used to characterize axonal changes in ocular hypertension.
Cohort I: Rat ON Training and Testing Dataset
Animals, Tissue Preparation and Imaging
All procedures were approved by the Institutional Animal Care and Use Committee at the Atlanta Veterans Affairs Medical Center and Georgia Institute of Technology and conformed to the Animal Research: Reporting of In Vivo Experiments (ARRIVE) and to the ARVO Statement for the Use of Animals in Ophthalmic and Vision Research guidelines. ONs from 14 Brown Norway rats (12 male/2 female, ranging in age from 3–13 months old; Charles River Labs, Wilmington, MA, USA) were used. Unilateral ocular hypertension was induced by anterior chamber microbead injection (12 animals) or episcleral vein hypertonic saline injection (two animals). Intraocular pressures (IOPs) were measured twice/week by calibrated rebound tonometry (iCare TONOLAB; Colonial Medical Supply, Londonderry, NH, USA). IOP burdens, the cumulative IOP difference between the hypertensive and control eye, were computed by calculating the area between the hypertensive and normotensive eyes on IOP versus time (days) plots. The cohort of animals used to develop the model had IOP burdens ranging from 36 to 505 mm Hg ∙ day in the hypertensive eye, expected to correspond to a wide range of glaucomatous damage. ON tissues from 27 of the 28 ONs were used for this study; one eye was deemed unsuitable for experimental glaucoma studies because it exhibited abnormally high IOP elevation, which led to extreme damage. This is the same cohort used to train our previous software, AxoNet 1.0.18 
Eyes were enucleated immediately after euthanizing animals with CO2. The ONs were transected with micro scissors less than 1 mm posterior to the sclera and immediately placed in Karnovsky's fixative, composed of electron microscopy (EM) grade 2% paraformaldehyde (Electron Microscopy Sciences, Hatfield, PA, USA), EM grade 2.5% glutaraldehyde (Electron Microscopy Sciences), 0.1 M Sorensen's buffer (Electron Microscopy Sciences) in double distilled deionized water, for 60 minutes at room temperature, followed by post-fixation in 1% osmium tetroxide (Electron Microscopy Sciences) for 60 minutes at room temperature. The tissue was dehydrated in a 70% to 100% ethanol series (Electron Microscopy Sciences) and transferred into propylene oxide (Electron Microscopy Sciences). The ONs were then infiltrated and embedded in Araldite 502/Embed 812 resin (Electron Microscopy Sciences). Sections of 0.5 µm thickness were cut using a Leica UC7 Ultramicrotome (Leica Microsystems, Buffalo Grove, IL, USA) and stained with 1% toluidine blue (Sigma-Aldrich, St. Louis, MO, USA). This protocol yields control optic nerves indistinguishable from nerves from perfusion-fixed animals (i.e., in our hands, tissue integrity was not affected by using CO2 euthanasia). 
A tile scan for each semithin section was obtained using a Leica DM6 B microscope (Leica Microsystems), using a 63 × lens and 1.6 × multiplier to yield 100 × magnification. The microscope was set to acquire multiple images for each tile by varying the distance between the stage and the objective lens in an automated manner. Then, the optimally focused image for each tile was selected using the “find best focus” feature and contrast was adjusted by maximizing gray-value variance in the LAS-X software (Leica Microsystems) for each tile. The best focused individual tiles were then stitched together using the LAS-X software (Leica Microsystems) to create a high-resolution montage of the entire ON cross-section (Fig. 1A). 
Figure 1.
 
ON sampling and annotation. ON sub-images (B) were randomly selected from tiled micrographs of the entire ON cross-section (A). Axoplasm and myelin sheaths in each sub-image were manually annotated by two masked, independent annotators to produce a “ground truth” annotation (C), in which pink and yellow represent the manually annotated axoplasm and myelin sheath of the normal-appearing axons, respectively.
Figure 1.
 
ON sampling and annotation. ON sub-images (B) were randomly selected from tiled micrographs of the entire ON cross-section (A). Axoplasm and myelin sheaths in each sub-image were manually annotated by two masked, independent annotators to produce a “ground truth” annotation (C), in which pink and yellow represent the manually annotated axoplasm and myelin sheath of the normal-appearing axons, respectively.
Dataset Curation and Division
To create a database of images to develop the deep learning model, 1421 12 × 12 um2 (188 × 188 pixel) grayscale sub-images, with a resolution of 15.7 pixel/µm, were randomly selected from the Cohort I cross-sectional micrographs (Figs. 1A, 1B). This is a curated subset of the images that we previously used.18 At least 20 nonoverlapping sub-images were obtained from each ON tile-scan. These sub-images exhibited a wide range of glaucomatous axon degeneration and image quality issues, including blurring and stitching defects. 
To manually annotate these images, 14 individuals were trained to identify and label the axoplasm and myelin sheath of “normal-appearing” RGC axons in each sub-images of Brown Norway rat ON cross-sections using ITK-Snap (v3.6.0) (Figs. 1C, 2). See Supplementary Material S2 for more details regarding the training process. We defined a “normal-appearing axon” as a structure with a continuous and uniformly thick myelin sheath, a homogenous light axoplasm with occasional localized staining, and absence of obvious swelling or shrinkage. 
Figure 2.
 
Pictorial representation of axoplasm and myelin sheath ground-truth probability map creation and structure of ground-truth (or, manually annotated) RGC axon probability map in PNG format. The grayscale bar displays the probability that a given pixel belongs to a given class (i.e., axoplasm or myelin sheath).
Figure 2.
 
Pictorial representation of axoplasm and myelin sheath ground-truth probability map creation and structure of ground-truth (or, manually annotated) RGC axon probability map in PNG format. The grayscale bar displays the probability that a given pixel belongs to a given class (i.e., axoplasm or myelin sheath).
To account for inter-individual differences between annotators, each pixel of each image was assigned a probability of belonging to the axoplasm and a separate probability of belonging to the myelin sheath. These probabilities were obtained by averaging the annotations as follows: if all annotators agreed that a given pixel was axoplasm, that pixel was assigned a probability of 1 for being axoplasm; if one of two annotators marked the pixel as axoplasm and the other did not, the pixel was assigned a probability of 0.5; etc. (Fig. 2). Pairwise inter-individual annotation similarity was quantified via the soft-Dice coefficient, spairedDice, defined as:  
\begin{eqnarray*} && {s_{paired\ Dice}} =\frac{1}{{num\ of\ classes}} \nonumber \\ && \mathop \sum \limits_{classes} \frac{{2\ \times \mathop \sum \nolimits_{pixels} \ {y_{annotation\ 1}} \times {y_{annotation\ 2}}}}{{\mathop \sum \nolimits_{pixels} \ y_{annotation\ 1}^2 + \mathop \sum \nolimits_{pixels} \ y_{annotation\ 2}^2}}\end{eqnarray*}
where y represents the class probability value of a pixel in the sub-image; annotation 1 and annotation 2 refer to the annotations produced by two independent annotators; and numofclasses  =  2, representing the axoplasm and myelin sheath of normal-appearing RGC axons. 
The annotation files were aligned to the sub-image orientation as detailed in Supplementary Material S3. These sub-images were randomly divided into three datasets: training (80%), validation (10%), and testing (10%) sets, ensuring that sub-images selected from each ON were present only in one of the sets to prevent leakage of information between the datasets, because data leakage can lead to over-optimistic validation and testing results. This 80:10:10 split was chosen for several reasons. First, a large fraction of the images was allocated to the training dataset as it is typical for training machine learning algorithms, especially with such a small (<10,000 images) dataset. Second, we wanted to ensure variability in image quality, contrast, and tissue health in the validation and testing sets; hence, we could not enlarge the training dataset further. 
Network Architecture and Training
Inspired by the original U-Net architecture,20 an encoder-decoder style convolutional neural network was trained to perform semantic segmentation using the Cohort I dataset. 
Model Architecture
The architectural parameters of the model were based on the VGG16 model,21 a convolutional neural network trained for image classification. Figure 3 illustrates the model architecture. Initially we implemented the original VGG16 model architecture, but the model was large enough to memorize the training dataset which led to poor generalization performance. Thus we chose to only incorporate half of the number of filters. As in VGG16, consecutive convolutional layers were padded so that the convolution output maintained the same dimensions as the input for both contracting and expanding paths. Each output was fed through rectified linear unit (ReLU) activation functions. At the final layer, a 1 × 1 convolution was used to map feature vectors to the axoplasm and myelin sheath of the axons. The model output was then fed into a softmax activation function to transform the class scores to pixelwise class probabilities for comparison with the manual (ground-truth) annotations. The convolutions were performed by filters of kernel size 3 × 3. This network was implemented in Python (v3.7.4) using Keras and Tensorflow. 
Figure 3.
 
Overview of AxoNet 2.0 image processing workflow, including the deep learning model architecture, based on.20 Layer arrays and operations are shown by colored boxes and arrows. Overlapping consecutive sub-image from full ON images were fed into the deep learning model. The cropped model's output for each sub-image (part of image enclosed in the red dashed box in “model output for sub-image”) was stitched to obtain an axoplasm probability map for the entire ON, after which maps were post-processed. Morphometric properties of axons were computed from the post-processed output. Note that the color of the axoplasm in axons detected by the algorithm is arbitrary, and simply serves to aid in visualization of the output.
Figure 3.
 
Overview of AxoNet 2.0 image processing workflow, including the deep learning model architecture, based on.20 Layer arrays and operations are shown by colored boxes and arrows. Overlapping consecutive sub-image from full ON images were fed into the deep learning model. The cropped model's output for each sub-image (part of image enclosed in the red dashed box in “model output for sub-image”) was stitched to obtain an axoplasm probability map for the entire ON, after which maps were post-processed. Morphometric properties of axons were computed from the post-processed output. Note that the color of the axoplasm in axons detected by the algorithm is arbitrary, and simply serves to aid in visualization of the output.
Model Training
The model's contracting path consisted of four max pooling layers that down-sampled the convolutional output size by a factor of 2 at each step. Therefore, the input sub-images had to be resized so that their dimensions (in pixels) were evenly divisible by 16. We also had to consider issues that could arise from cropping edge artifacts, such as axons that intersected the edges of the sub-images due to random selection of sub-images from full ONs. Therefore input sub-images were symmetrically padded to prevent the propagation of influence from edge artifacts and to ensure seamless model segmentations.20 In brief, the sub-images and ground-truth annotations were bicubically interpolated to 188 × 188 pixels and then symmetrically padded by reflecting the outermost 18 pixels at each border across the image borders to give a final size of 224 × 224 pixels. The input images were then z-normalized and clipped such that pixel intensities lay between −1 and 1. The last channel of zeros from the PNG formatted probability maps was not used during training. A stochastic gradient descent optimization was used to minimize the soft-Dice loss function, sloss, defined as follows:  
\begin{eqnarray*} && {s_{loss}} = 1 - \frac{1}{{num\ of\ classes}} \nonumber \\ && \mathop \sum \limits_{classes} \frac{{2\ \times \mathop \sum \nolimits_{pixels} {y_{model}} \times {y_{annotation}}}}{{\mathop \sum \nolimits_{pixels} y_{model}^2 + \mathop \sum \nolimits_{pixels} y_{annotation}^2}}\end{eqnarray*}
where y represents class probability value of a pixel in the sub-image, and annotation and model refer to the ground-truth annotation and prediction produced by the deep learning model, respectively. As defined before, the numofclasses  =  2, denoting the axoplasm and myelin sheath of the normal-appearing RGC axons. 
Mirrored pixels were not included when evaluating the soft-Dice loss coefficient during training. As recommended by Ronneberger et al.,20 the input sub-images and annotations were excessively augmented to counter the small size of the dataset. Data augmentation included affine transformations such as horizontal and vertical flipping and shifting (along the height and width by 12 pixels), rotation by small angles (±5°), and shearing (shear angle range ±5°). Image interpolation during augmentation was performed by reflecting the outermost pixels outside the image boundaries (i.e., the empty part of the image created by the abovementioned transformations). The network was trained for 100 epochs with a batch size of 1 and a learning rate of 0.001. The training set sub-images were shuffled after each epoch. Training was stopped early to prevent overfitting if the soft-Dice loss coefficient of the validation set, sloss, did not decrease after 10 epochs. 
ON Micrograph Processing Pipeline
The trained deep learning model was incorporated into a pipeline to analyze light micrographs of entire ONs (Fig. 3). To prevent cropping of the edges of the ON micrograph because of sub-image overlap, the micrograph of the entire ON cross-section was zero-padded by 18 pixels across its border (Fig. 4A). Then, overlapping consecutive sub-images (224 pixels × 224 pixels) were fed into the deep learning model, with ∼16% overlap (i.e., 36 pixels between the sub-images and neighboring sub-images, which was equal to twice the padding size used during training [Fig. 4B]). The amount of overlap was chosen to provide an optimum balance between the training speed/processing versus accuracy of the deep learning model on a laboratory-grade personal computer. There was some degree of arbitrariness in this choice, which was ultimately driven by manual inspection of stitched images and comparison to what an expert human annotator would have chosen. 
Figure 4.
 
ON micrograph padding and consecutive sub-image cropping. (A) Zero-padding of ON micrograph around border to prevent cropping of edges (largest yellow box). (B) A zoom of the region enclosed in the red box in (A), displaying the 36 pixel overlap between the sub-image that will be fed to the deep learning model (blue box) and neighboring sub-images (dashed yellow boxes). The green arrow in (A) and (B) shows the direction in which the micrograph will be scanned. According to this scanning scheme, the next sub-image that will be provided to the deep learning model will be the dashed yellow box to the right of the blue box.
Figure 4.
 
ON micrograph padding and consecutive sub-image cropping. (A) Zero-padding of ON micrograph around border to prevent cropping of edges (largest yellow box). (B) A zoom of the region enclosed in the red box in (A), displaying the 36 pixel overlap between the sub-image that will be fed to the deep learning model (blue box) and neighboring sub-images (dashed yellow boxes). The green arrow in (A) and (B) shows the direction in which the micrograph will be scanned. According to this scanning scheme, the next sub-image that will be provided to the deep learning model will be the dashed yellow box to the right of the blue box.
The output of the deep learning model was a probability map for the axoplasm and myelin sheaths of normal-appearing axons (Fig. 5). We note that the axoplasm and myelin sheath probability maps were complements because the probability space for RGC axon segmentation consisted only of two outcomes i.e., axoplasm and myelin sheath. Because the model's myelin sheath probability map was the complement of the axoplasm probability map, only the axoplasm probability map was used to identify individual axons. The output of the deep learning model was cropped by 18 pixels around the boundary (Fig. 3, cropped sub-image, dashed red box). Outputs from each individual sub-image were concatenated to produce a final axoplasm probability map for the entire ON cross-sectional image. 
Figure 5.
 
Example of deep learning model output. The axoplasm and myelin sheath probability map predicted by the AxoNet 2.0 software for the ON sub-image (A) is shown in (B) and (C), respectively. The grayscale bar displays pixel-wise axoplasm and myelin sheath probability.
Figure 5.
 
Example of deep learning model output. The axoplasm and myelin sheath probability map predicted by the AxoNet 2.0 software for the ON sub-image (A) is shown in (B) and (C), respectively. The grayscale bar displays pixel-wise axoplasm and myelin sheath probability.
It was necessary to carry out several operations on the axoplasm probability maps to reliably identify the axoplasm of all normal-appearing axons. The operations which were carried out on the axoplasm probability map of the entire ON cross-sections in Python (v3.7.4) using Scikit-image (v0.18.0) and Open-CV (v4.5.1), were as follows: 
Elimination of isolated regions classified as having high probability of being axoplasm but lying outside myelin sheaths
Probability maps were modified via a morphological operation called reconstruction (Supplementary Material S4). For our purpose, the intensity values in the images were the axoplasm probability values, the mask image was the original axoplasm probability map, and the marker image was formed by subtracting a constant value, chosen to be 0.6, from the axoplasm probability map. This approach caused pixels with axoplasm probability ≥0.6 lying outside objects in the mask image to not be transformed by reconstruction. In this way, morphological reconstruction helped disregard any pixels with spuriously high probabilities between axoplasm regions which could result from the deep learning algorithm classifying pixels as axoplasm in areas between axons with very thin myelin sheaths. This in turn allowed easier distinction of the axoplasm regions of two neighboring axons in subsequent steps. 
Axoplasm classification
All pixels with a probability ≥0.6 of being axoplasm were classified as axoplasm. This excluded most extra-axonal structures, which typically had a probability close to 0.5, and myelin sheaths which typically had a probability close to 0.2. 
Axoplasm dilation
The axoplasm regions were increased by 1 pixel around the boundaries using morphological dilation with a unit disk as the structuring element. This slight increase in axoplasm size included boundary pixels that should have been marked as axoplasm with a higher probability by the deep learning algorithm according to our guidelines (Supplementary Fig. S1) and helped visualize the axoplasm of smaller axons in the final segmentation output. Dilating the axoplasmic region by 1 pixel increased the axoplasm diameter by 0.13 µm, giving an average axon diameter in control Brown-Norway rats from our study of 0.68 ± 0.06 µm, which is well within the normative range in various strains of rats of 0.2–3.0 µm, with an average of ∼0.70 µm.22 
The output of the deep learning model was further processed via the post-processing pipeline highlighted in Supplementary Material S5. The automated axon inclusion regime in optic nerve sub-images is highlighted in Supplementary Material S6. The axon and myelin sheath morphological properties extracted are defined in Supplementary Material S7
Axon Detection Metrics
To evaluate the deep learning model's capability to detect RGC axons, the soft-Dice coefficient comparing model predictions and manual annotations was computed and the sensitivity and precision of the model were assessed. The soft-Dice coefficient, sDice , was computed as follows:  
\begin{eqnarray*} && {s_{Dice}} = \frac{1}{{num\ of\ classes}} \nonumber \\ && \mathop \sum \limits_{classes} \frac{{2\ \times \mathop \sum \nolimits_{pixels} \ {y_{model\ prediction}} \times {y_{manual\ annotation\ }}}}{{\mathop \sum \nolimits_{pixels} \ y_{model\ prediction}^2 + \mathop \sum \nolimits_{pixels} \ y_{manual\ annotation}^2}}\end{eqnarray*}
where y represents the class probability value of a pixel in the sub-image; modelpredicition and manualannotation refer to the segmentation produced by the deep learning model (algorithm) and ground-truth annotations, respectively; and numofclasses  =  2, representing the axoplasm and myelin sheath of normal-appearing RGC axons for full output while numofclasses  =  1 when only the axoplasm or myelin sheath were considered. 
Sensitivity
Sensitivity, also known as the true-positive rate, was computed as  
\begin{eqnarray*}Sensitivity = \ \frac{{TP}}{{TP + FN}}\end{eqnarray*}
where TP is the number of axons present in the ground-truth annotations correctly identified by the algorithm (true-positives), and FN is the number of axons present in the ground-truth annotations not identified by the algorithm (false-negatives). Precision, also known as Positive Predictive value, was computed as  
\begin{eqnarray*}Precision = \ \frac{{TP}}{{TP + FP}}\end{eqnarray*}
where FP is the number of axons not present in the ground-truth annotations incorrectly identified by the algorithm as axons (false-positives). 
Cohort II: Mouse and Non-Human Primate ON Dataset
AxoNet 2.0’s counting performance was also evaluated on sub-images randomly selected from light micrographs of ON cross-sections taken from DBA/2J mice (22 animals) and non-human primates (25 animals), obtained from the Anderson Lab (University of Iowa) and the Burgoyne Lab (Optic Nerve Head Research Laboratory, Devers Eye Institute, Portland, OR, USA), respectively. These images were from eyes exhibiting varying degrees of nerve damage, including control eyes. The non-human primates had experimental glaucoma induced unilaterally by laser treatment of the trabecular meshwork,14,23 and the mice had inherited glaucoma and blast wave (positive overpressure wave) induced traumatic optic neuropathy.24,25 
The non-human primate dataset was previously used to validate the performance of an existing axon counting package, AxonMaster.14 The non-human primate and mice datasets consisted of 50 and 22 sub-images, respectively; in each sub-image, the axons had previously been manually counted. The non-human primate and mouse dataset curation processes have been previously reported.14,19 
Cohort III: Rat Whole ON Dataset
It was useful to assess the deep learning algorithm's performance against previous research. To do so, we used Brown Norway rats in which unilateral ocular hypertension was induced by magnetic microbead injection into the anterior chamber. These animals were a subset of those used in previous publications.2628 It is important to note that some of the hypertensive eyes in these previous publications were also exposed to a scleral stiffening agent delivered by retrobulbar injection, namely genipin (50 eyes) or methylene blue plus targeted light irradiation (49 eyes). Methylene blue plus light irradiation led to appreciable RGC axonal loss even in the absence of ocular hypertension,29,30 and genipin also led to axonal loss, albeit of much smaller magnitude.27,29,30 Therefore, to avoid such confounding effects, in this cohort we focused on only control eyes (i.e., eyes that received a retrobulbar injection of Hanks balanced salt solution [55 eyes received]).27,29,30 We further restricted analysis to pairs of eyes where the hypertensive eye had an IOP burden > 100 mm Hg ∙ day, yielding a final cohort of n = 23 pairs. Tissue was processed as for Cohort I, and light micrographic montages of the entire optic nerve cross section were created as for Cohort I. 
From these images, several outcomes were obtained. First, we determined axon count and density in both hypertensive and control eyes. The number of normal appearing axons was counted in the entire ON cross-section, and axon density was computed by dividing total axon count by ON cross-sectional area. The ON boundary was defined as the edge of the nerve bundles closest to the pia, excluding the pia. The ON boundary was manually outlined in ImageJ (v1.8.0) to aid ON boundary location in Python, ON cross-sectional area was determined using custom Python code (v3.7.4) using Open-CV (v4.5.1) (37). 
Large axons are preferentially lost in glaucoma, and this effect has been quantified in a NHP model of ocular hypertension by Quigley et al.31 To compare with this data set, we also evaluated axon size distribution in our rat eyes, following the procedure of Quigley et al.31 Specifically, axons were grouped by axoplasm diameter into 17 bins, from 0.25 to 1.85 µm. Then, for each axon size bin, i, the normalized ratio of the hypertensive eye's ON axon density to the contralateral control eye's ON axon density, Ri, was computed as adapted from.31 Essentially, an Ri value greater than 1 means that axons in size bin i are relatively preserved in hypertensive eyes (compared to the axon loss averaged over all axon sizes), whereas values less than one indicate greater relative loss in that size bin. In more detail, we defined the following:  
\begin{eqnarray*}{R_i} = \frac{1}{N}\Sigma _{k = 1}^N{S_{ik}}\end{eqnarray*}
where N denotes the number of rats, Sik is the value of Si for the kth rat,  
\begin{eqnarray*}{S_i} = \ \frac{{{A_i}}}{{\frac{1}{n}\mathop \sum \nolimits_{j = 1}^n {A_j}}}\end{eqnarray*}
and  
\begin{eqnarray*}\ && {A_i} = \nonumber \\ && \ \left\{ \begin{array}{@{}l@{}} \frac{{\left| {\left\{ {axon\ \epsilon \ {a_e}\,s.t.\,{\rm{\Delta }}*\ \left( {i - 1} \right) + LB \le diameter\left( {axon} \right) < {\rm{\Delta }}*i + LB} \right\}} \right|}}{{\left| {\left\{ {axon\ \epsilon \ {a_c}\,s.t.\,{\rm{\Delta }}*\left( {i - 1} \right) + LB \le diameter\left( {axon} \right) < {\rm{\Delta }}*i + LB} \right\}} \right|}},\,\,\,i < n\\[-6pt]\\ \frac{{\left| {\left\{ {axon\ \epsilon \ {a_e}\,s.t.\,{\rm{\Delta }}*\left( {i - 1} \right) + LB \le diameter\left( {axon} \right)} \right\}} \right|}}{{\left| {\left\{ {axon\ \epsilon \ {a_c}\,s.t.\,{\rm{\Delta }}*\left( {i - 1} \right) + LB \le diameter\left( {axon} \right)} \right\}} \right|}},\,\,\,i = n \end{array} \right.\end{eqnarray*}
where n = 17 is the number of size bins, Δ = 0.1 µm is the bin width, ae represents all axons in the experimental eye, ac represents all axons in the control eye, and LB is the lower bound on axon diameter, taken as 0.25 µm, 
Results
We first discuss the model's learning performance, demonstrate the model's axon segmentation accuracy, and report the computational performance of the algorithm. Details regarding the model's training performance, interindividual annotation similarity and effects of post-processing can be found in Supplementary Material S8, S9, and S10
Evaluation of Axon Segmentation and Counting Accuracy
Model Segmentation Agreement with Annotations
To evaluate the model's segmentation performance, the deep learning model axon segmentations (Fig. 6) were compared with manual annotations using the soft-Dice coefficient, sDice (Table). The deep learning model was able to segment the axoplasm and myelin sheath in fairly good agreement with the manual annotations.17,32 Also, the average values for the testing dataset were similar to, although slightly less than, the agreement between two trained annotators (Table, Supplementary Fig. S15), indicating that the performance of the deep learning model was comparable to, albeit very slightly inferior to, human annotators, which we judged to be acceptable. 
Figure 6.
 
Examples of typical deep learning model segmentation results from rat optic nerves. Detected axoplasm color is arbitrary.
Figure 6.
 
Examples of typical deep learning model segmentation results from rat optic nerves. Detected axoplasm color is arbitrary.
Table.
 
Mean Soft-Dice Coefficients and 95% Confidence Intervals for Comparison Between Axonet2.0 Versus Human Annotations for the Training, Validation, and Testing Datasets (Rows 1-3) and Between Human Annotators (Last Row)
Table.
 
Mean Soft-Dice Coefficients and 95% Confidence Intervals for Comparison Between Axonet2.0 Versus Human Annotations for the Training, Validation, and Testing Datasets (Rows 1-3) and Between Human Annotators (Last Row)
Quantification of Axon Counting Performance in Rats
Next, to analyze AxoNet 2.0’s performance in counting normal-appearing axons, we computed the sensitivity and precision of the algorithm's output on the testing dataset, finding a sensitivity of 0.92 and precision of 0.83. Counting performance was also compared with that of AxoNet 1.018 by regressing the automated counts against the ground-truth manual counts for all the three datasets (i.e., training, validation and test sets) for both AxoNet 1.0 and AxoNet 2.0. AxoNet 2.0 outperformed AxoNet 1.0 as judged by agreement with ground truth manual counts and lower root mean square error (Fig. 7). We also observed that both tools had a statistically significant bias (P < 0.001, one-sample t-test) to count more axons, but AxoNet 2.0’s was smaller. Additionally, AxoNet 2.0’s limits of agreement were also tighter than those of AxoNet 1.0 (Fig. 8). It has been demonstrated that automated methods typically fail to accurately count axons in moderately damaged optic nerve micrographs. To demonstrate AxoNet 2.0’s performance on moderately damaged optic nerves, i.e., nerves with Morrison damage scores 2-4, we counted axons in sub-images from the testing data set from eyes with Morrison damage scores 2-4. From regression and Bland-Altman analyses (Supplementary Figures S17, S18), we conclude that AxoNet 2.0 shows good performance on these moderately damaged nerves. 
Figure 7.
 
Comparison between automated and manual counts for the rat ON dataset, including training (A), validation (B), and testing (C) subsets. Note the closer overall agreement between automated counts and manual counts (higher R2 values) for AxoNet 2.0 (red) versus AxoNet 1.0 (black). Testing set root mean square error was 6.18 and 8.62 for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 7.
 
Comparison between automated and manual counts for the rat ON dataset, including training (A), validation (B), and testing (C) subsets. Note the closer overall agreement between automated counts and manual counts (higher R2 values) for AxoNet 2.0 (red) versus AxoNet 1.0 (black). Testing set root mean square error was 6.18 and 8.62 for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 8.
 
Bland-Altman plots showing agreement between automated counts and manual (ground-truth) axon counts for AxoNet 2.0 (A) and AxoNet 1.0 (B), showing that AxoNet 2.0 had less bias than AxoNet 1.0. The 95% limits of agreement, represented by the dashed lines, were (−6.84, 13.57) and (−7.4, 18.5) for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 8.
 
Bland-Altman plots showing agreement between automated counts and manual (ground-truth) axon counts for AxoNet 2.0 (A) and AxoNet 1.0 (B), showing that AxoNet 2.0 had less bias than AxoNet 1.0. The 95% limits of agreement, represented by the dashed lines, were (−6.84, 13.57) and (−7.4, 18.5) for AxoNet 2.0 and AxoNet 1.0, respectively.
Axon Counting Performance in Other Animal Models
We next assessed the axon counting performance of AxoNet 2.0 on mouse and non-human primate datasets, and compared it with two existing deep learning-based segmentations codes: AxoNet 1.0 and AxonDeep (19). Visual inspection of segmentations by AxoNet 2.0 showed good qualitative performance in both mice and NHPs (Figs. 9 and 10). Quantitatively, when comparing AxoNet 1.0 with AxoNet 2.0, we observed similar or better correlations between AxoNet 2.0’s automated count and manual counts for both the non-human primate (AxoNet 2.0: R2 = 0.97 vs. AxoNet 1.0: R2 = 0.98, Supplementary Fig. S19) and the mouse (AxoNet 2.0: R2 = 0.98 vs. AxoNet 1.0: R2 = 0.94, Supplementary Fig. S20) datasets. Bland-Altman analysis showed that both algorithms tended to count appreciably more axons in the non-human primate dataset versus. manual counts (∼70 and 110 axons, i.e., an average percentage difference of ∼36% and 74%, by AxoNet 2.0 and 1.0, respectively, Supplementary Fig. S21) and to count fewer axons in the mouse dataset versus manual counts (∼50 axons, i.e., an average percentage difference of ∼25% by AxoNet 2.0 and 1.0, Supplementary Fig. S22). Mouse optic nerves (n = 22) used in this article were obtained from eyes that had been immersion fixed or from animals that had been perfusion fixed.19 Specifically, a slight majority of the nerves were collected from mice undergoing perfusion fixation (n = 12 nerves), whereas the remainder were from eyes undergoing immersion fixation after immediate enucleation (n = 10 nerves). Importantly, the percent error in axon counts was essentially the same for the two different fixation techniques, indicating that AxoNet performed comparably well on tissues fixed using both approaches. For the non-human primate dataset, even though AxoNet 2.0 produced a larger limit of agreement, it had a relatively lower counting bias in comparison to AxoNet 1.0. For the mouse dataset, although we observed similar counting bias, AxoNet 2.0’s limits of agreement were tighter. 
Figure 9.
 
Visualization of AxoNet 2.0's segmentation performance on a non-human primate ON dataset. Panel A represents a fairly healthy nerve region whereas Panel B shows is a sub-image of a severely damaged ON. Detected axoplasm color is arbitrary.
Figure 9.
 
Visualization of AxoNet 2.0's segmentation performance on a non-human primate ON dataset. Panel A represents a fairly healthy nerve region whereas Panel B shows is a sub-image of a severely damaged ON. Detected axoplasm color is arbitrary.
Figure 10.
 
Visualization of AxoNet 2.0's segmentation performance on a mouse ON dataset. Panel A shows a fairly healthy nerve region whereas Panels B and C show slightly damaged and very damaged ONs, respectively. Images A and C were obtained from immersion-fixed globes while Panel B was obtained from a perfusion-fixed mouse. Detected axoplasm color is arbitrary.
Figure 10.
 
Visualization of AxoNet 2.0's segmentation performance on a mouse ON dataset. Panel A shows a fairly healthy nerve region whereas Panels B and C show slightly damaged and very damaged ONs, respectively. Images A and C were obtained from immersion-fixed globes while Panel B was obtained from a perfusion-fixed mouse. Detected axoplasm color is arbitrary.
Turning now to a comparison of AxoNet 2.0 and AxonDeep, these two algorithms had similar performance; specifically, when regressing automated vs. manual axon counts for their respective data sets, AxonDeep showed R2 = 0.9719 whereas AxoNet 2.0 showed R2 = 0.98. Similarly, the mean soft Dice coefficient for segmenting axoplasm in a mouse test data set was 0.81 for AxonDeep and in a rat data set was 0.83 for AxoNet 2.0. However, there was a difference in total number of counted axons as compared to manual annotations: as noted above, AxoNet 2.0 showed a mean ∼25% difference in axon count (automated analysis of mouse ON sub-images versus manual counts of mouse ON sub-images), whereas AxonDeep showed a ∼4% mean difference. Therefore, although AxonDeep and AxoNet 2.0 exhibited strong correlation with manual counts, they consistently differed in the number of axons they counted. Further manual inspection of segmentation results showed that these quantitative discrepancies were due to subtly differing criteria for manually assessing a normal-appearing axon between the Ethier laboratory and the laboratory contributing the mouse data set. A similar inter-lab discrepancy was found in the NHP data set, reinforcing the fact that comparison of axon counting results between labs can be challenging and dependent on precise definitions of a “normal-appearing” axon. 
Evaluation of Axonal Morphology
We also compared the manual and deep learning segmentations by extracting the mean and median of various axonal morphological properties from all testing set images, and computed the mean absolute percentage error (MAPE) between manual and automated segmentations (Supplementary Table S4). Generally, there was good agreement; for example, mean and median axoplasmic and myelin sheath cross-sectional areas agreed well between manual and deep learning segmentations (MAPE < 11.7% and < 3.2%, respectively), as did axoplasm eccentricity and myelin sheath thickness (MAPE < 4.2% and < 6%, respectively). 
Axon Density and Size Distribution in Glaucomatous Eyes
Finally, to assess the deep learning algorithm's performance against previous research, the axon density and axon size distributions in entire optic nerves from hypertensive and normotensive Brown Norway rat eyes were determined. As expected,3335 hypertensive eyes showed lower axon counts (count in control eyes: 82,301 ± 16,431 vs. count in hypertensive eyes: 24,780 ± 23,730; P = 1.24 × 10−9, two-sided paired t-test; Fig. 11) and an approximate 50% decrease in axon density (density in control eyes: 0.30 ± 0.06 axons/µm2 vs. density in hypertensive eyes: 0.10 ± 0.08; P = 1.63 × 10−9, two-sided paired t-test). 
Figure 11.
 
Comparison of RGC axon count (A) and axon density (B) between control and contralateral hypertensive eyes for 23 Brown-Norway rats. As expected, we observe a decrease in axon density in the hypertensive eye (P = 1.63 × 10−9). Outlined points represent outliers, defined as values ≥ 1.5 times the interquartile range above and below the third and first quartile, respectively.
Figure 11.
 
Comparison of RGC axon count (A) and axon density (B) between control and contralateral hypertensive eyes for 23 Brown-Norway rats. As expected, we observe a decrease in axon density in the hypertensive eye (P = 1.63 × 10−9). Outlined points represent outliers, defined as values ≥ 1.5 times the interquartile range above and below the third and first quartile, respectively.
The preferential loss of large axons has been previously characterized in a NHP model of ocular hypertension.31 Using a metric identical to that of Quigley and colleagues31 to describe the relative loss of large and small axons (see Methods), we also observed a preferential loss of large axons with ocular hypertension in rats (Fig. 12), although regression showed only a modest dependence of survival on axon diameter (regressing normalized axonal density against axon diameter, slope = −0.46, r = −0.603, P < 0.001). 
Figure 12.
 
Normalized ratio of RGC axon density (hypertensive/control) vs. axoplasm diameter. Panel A shows data for Brown Norway rats (n = 23 pairs of eyes), whereas Panel B is a similar graph for non-human primates (n = 10 pairs of eyes; replot of data from Quigley et al.31). The curve is the mean ratio of axon number in glaucomatous nerves compared to their contralateral control eyes; dashed lines show ± 1 SD. Axons in a particular diameter bin are relatively more preserved than other axon sizes if the ratio is greater than 1. Preferential loss of large axons is seen. The largest bin contained all axons with diameter ≥ 1.85 µm.
Figure 12.
 
Normalized ratio of RGC axon density (hypertensive/control) vs. axoplasm diameter. Panel A shows data for Brown Norway rats (n = 23 pairs of eyes), whereas Panel B is a similar graph for non-human primates (n = 10 pairs of eyes; replot of data from Quigley et al.31). The curve is the mean ratio of axon number in glaucomatous nerves compared to their contralateral control eyes; dashed lines show ± 1 SD. Axons in a particular diameter bin are relatively more preserved than other axon sizes if the ratio is greater than 1. Preferential loss of large axons is seen. The largest bin contained all axons with diameter ≥ 1.85 µm.
Computational performance
The deep learning algorithm required an average of 50 ± 15 minutes to provide the RGC axon segmentations for an entire ON micrograph (∼10,000 × 10,000 pixels) on a Windows 10 system (Intel Core i7-3770 CPU @ 3.4 GHz, 16GB RAM). The post-processing pipelines required an average of 30 ± 15 minutes per ON axon segmentation. The code is available at https://github.com/VidishaGoyal/AxoNet-2.0
Discussion
In this study, we developed and evaluated the performance of a deep learning-based algorithm, AxoNet 2.0, to segment “normal-appearing” RGC axons from light micrographs of healthy and damaged ON cross-sections. The algorithm was shown to be effective both for counting and carrying out morphometric analysis of normal-appearing axons in both healthy and damaged ONs, showing acceptable error when compared to human experts. AxoNet 2.0 also had similar or better performance than our previous axon counting software, AxoNet 1.0, which outperformed two automated axon counting algorithms, AxonMaster and AxonJ. A detailed example of the type of morphometric analysis that can be performed with data obtained from AxoNet 2.0 can be found in our companion paper.36 Such an automated RGC axon evaluation tool could facilitate future glaucoma and neuro-ophthalmic disease studies because of its ability to analyze a range of image qualities and ON damage levels in an unbiased manner. 
The complexity of the annotation process requires the specification of precise attributes to define normal-appearing RGC axons. As can be seen from Figure 2, there was some variability in annotations between individuals even after training. Accordingly, the identification of an RGC axon as “normal-appearing” is somewhat subjective, and consistent with this observation, there can be (and were) differences between our laboratory’s annotations and those of other laboratories. This in turn led to discrepancies between our software's output and the output expected by other labs. Although this ambiguity is somewhat inherent in light micrographic imaging of RGC axons, we have tried to mitigate it by adopting a consistent approach in which we carefully defined attributes of “normal-appearing” RGC axons and used a cross-validation approach in our manual annotations. We also compared our full ON axon counts to those reported previously in rat optic nerves,3,9,12,3739 where there exists a wide range of axon counts reported for normotensive eyes (i.e., 77,000-119,000 axons [Supplementary Table S5]), reinforcing the need to carefully define normal-appearing axons and to exercise caution in comparing axon counts between laboratories. 
Ideally, we would have evaluated the performance of AxoNet 2.0 against whole nerve axon counts in rats, mice, and monkeys, considering a range of damage extents. Unfortunately, full optic nerve counts are so time-consuming that it was infeasible to carry out such counts for all nerves considered in this work. Additionally, AxoNet 2.0 not only counts axons, but also assesses their morphological properties, and the creation of a whole optic nerve dataset with manual assessment of both axon counts and morphometry would be completely infeasible. Thus we instead adopted the following approach: 
  • We compared automated counts to manual counts in sub-images. This performance was very good (see Results—Quantification of Axon Counting Performance in Rats)
  • We carefully assessed the performance of the sub-image stitching algorithm. For example, we initially observed stitching issues for oblique-cut sections (Supplementary Section 5), which we corrected in the post-processing pipeline. Additionally, AxoNet 2.0 also provides a solution for considering edge artifacts when an optic nerve sub-image is provided as an input (Supplementary Section 6).
  • We compared our automated full optic nerve counts to previously published full nerve manual counts, as well as sub-sampled counts, observing generally good agreement (Supplementary Table S5).
There are some limitations in this study and areas for improvement. First, the algorithm was trained on images obtained from a bright-field microscope; therefore axons with diameter smaller than the resolving power of light microscopes will go undetected. This is an unavoidable limitation of light microscopy, which will affect the accuracy of any software algorithm to detect small “normal-appearing” axons,6 as well as manual counting schemes. However, we believe this to be an acceptable constraint because of the low cost and speed for tissue preparation and imaging via light microscopy, as well as the wide accessibility of light microscopy hardware. 
Second, as explained above, we defined only two classes in the RGC axon probability space (i.e., the axoplasm and the myelin sheath). This led the algorithm to optimize such that the probability maps of the axoplasm and the myelin sheath were complements of each other. It is likely that defining four classes (i.e., axoplasm, myelin sheath, glia and background) could improve segmentation performance. Third, our software analyzes full ON images in a tile-based manner, rather than analyzing a single image of the entire ON, which can lead to edge effects, as described above. Although our deep learning architecture can process entire nerves as a single image, doing so is extremely computationally expensive and cannot be handled by typical laboratory-based computer systems. Therefore, to mitigate edge artifacts, sub-images were cropped, as described above, before incorporation into the final output. 
Fourth, the post-processing pipeline that helps filter out extra-axonal structures and locates large axons in the micrographs was created by setting fixed thresholds and limits after analyzing their effects on a small set of entire ON micrographs representing a range of glaucomatous damage. The post-processing pipeline uses image processing tools that include thresholding the micrographs and filtering out structures based on shape and size that requires tuning of these thresholds and ranges (18 parameters, Supplementary Table S2). Although our parameter values may work on other image datasets, it is advised to calibrate these values based on a particular set of images if possible. Additionally, the post-processing pipeline is not perfect and is not able to correct all the erroneous detections mentioned above. 
Most importantly, our software has only been trained to detect “normal-appearing” axons. Because of animal-to-animal variability, the number of RGC axons varies between even naïve optic nerves of inbred animals of similar age, and thus detection of degenerating axons is in principle a more sensitive metric of glaucomatous damage. However, there may be subtle changes in the appearance of normal appearing axons that could be extremely useful as early markers of damage, and automated axon segmentation tools like AxoNet 2.0 may help identify such features.36 Further complicating this issue, we note that nerve damage/disease may present differently based on the type of injury. For example, axons may not exhibit typical morphological changes reported during axon degeneration in DBA/2J mice or optic nerve crush. Thus damage detection may be animal model-dependent. 
We believe our software could in principle be extended to detect degenerating axons and other structures, such as glial cells. Unfortunately, this is a much harder problem and was not within the scope of this work. However, we would suggest curating a dataset similar to that used to develop AxoNet 1.0, referred to as a “count/regression” dataset (i.e., a dataset in which human annotators count the number of structures, especially degenerating axons, instead of segmenting the structures). This approach is suggested because it is difficult to delineate the boundaries of individual degenerating axons in light micrographs of extremely damaged ONs. Although counting degenerating axons in these nerves is not an easy exercise either, trying to delineate the boundary of the degenerating axons to create a segmentation dataset is nearly impossible. 
An alternative would be to use corresponding electron micrographs to create the dataset, but because the tissue preparation procedure for electron and light microscopy is different, EM images must be procured from sections adjacent to those used for light microscopy. We made initial attempts at such an approach, but because axons do not necessarily run parallel to one another along the ON, we observed that axon-like structures present in a micrograph of one section may not be present in the micrograph of a nearby section. Additionally, creating a segmentation dataset is extremely time-consuming. It took 14 trained individuals approximately four months to create our segmentation dataset of ∼1500 images. An alternate approach could be to create a “count/regression” dataset using micrographs of moderately damaged ONs (i.e., Brown-Norway rat ON exhibiting a Morrison damage score of 2).5 We suggest this approach for several reasons. First, it would be easier to locate individual degenerating axons in micrographs of mildly damaged ONs, where the extent of extra-axonal change is modest. Second, the analysis of degenerating axons in micrographs of severely damaged ONs may not be a sensitive metric for ON damage. 
A concern in the deep-learning aspects of this work was the small gap between the validation and training set loss curves, which means that the validation set may contain images that are simpler and contain less variation than the training set. Another explanation is that augmentation may make the images more difficult for the model to predict accurately, which could make training loss increase without increasing validation loss. Once an appropriate architecture was trained, we explored ways to further narrow this gap (e.g., reshuffling the sub-images in the various dataset splits [training, testing, and validation], keeping the abovementioned dataset splitting criteria in mind, and retraining the model). After multiple attempts, we realized that our dataset was so complex that it was extremely difficult to create a training and validation dataset that could incorporate the variability in our axon segmentation dataset. But even with lower validation loss values during training, we observed a slight decrease in algorithm performance on the validation and testing data subsets. This may be attributed to the small size of the training dataset and the need to increase training dataset variability. By incorporating annotated images from other laboratories, from other animal models, and from other tissue imaging protocols into our data set, we may be able to increase the algorithm's generalizability while tackling the above limitation on axon definition and annotation. This further supports our previous comment that widespread collaborations and availability of more data can allow for model retraining to improve performance. 
Conclusions
In conclusion, the AxoNet 2.0 software successfully applies deep learning and image processing to automate RGC axon quantification, attaining good agreement with expert human evaluations of axons in rat ON cross-sections. Despite the limitations of this software, we believe it offers the opportunity to standardize RGC axon morphometry and perhaps identify subtle morphometric changes as early markers of axonal damage. 
Acknowledgments
The authors thank Alwin Khoja, Amshu Charagiri, Chanhyeok Kim, Daksha Jadhav, Esha Kashyap, Malia Yuhl, Nicholas Lima, Prasnav Naik, Ram Akella, Racheal Onyewuenyi, Tara Kimiavi, and Zahraw Shah for their efforts during the dataset curation process. 
Supported by NIH R01 EY025286 (CRE), 5T32 EY007092-32 (BGH), Department of Veteran Affairs R&D Service Career Development Award (RX002342; AJF), NIH NEI EY030871 (AJF), VA EY025580 (MGA), I50 RX003002 (WD, AHB, MGA, MKG), T32DK112751 (AHB), P30 EY025580, I01 RX001481, and the Georgia Research Alliance (CRE). 
Disclosure: V. Goyal, None; A.T. Read, None; M.D. Ritch, None; B.G. Hannon, None; G. Sanchez Rodriguez, None; D.M. Brown, None; A.J. Feola, None; A. Hedberg-Buenz, None; G.A. Cull, None; J. Reynaud, None; M.K. Garvin, None; M.G. Anderson, None; C.F. Burgoyne, None; C.R. Ethier, None 
References
Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis. Ophthalmology. 2014; 121: 2081–2090. [CrossRef] [PubMed]
Mikelberg FS, Drance SM, Schulzer M, Yidegiligne HM, Weis MM. The normal human optic nerve: Axon count and axon diameter distribution. Ophthalmology. 1989; 96: 1325–1328. [CrossRef] [PubMed]
Reese BE. The distribution of axons according to diameter in the optic nerve and optic tract of the rat. Neuroscience. 1987; 22: 1015–1024. [CrossRef] [PubMed]
Sanchez RM, Dunkelberger GR, Quigley HA. The number and diameter distribution of axons in the monkey optic nerve. Invest Ophthalmol Vis Sci. 1986; 27: 1342–1350. [PubMed]
Jia L, Cepurna WO, Johnson EC, Morrison JC. Patterns of intraocular pressure elevation after aqueous humor outflow obstruction in rats. Invest Ophthalmol Vis Sci. 2000; 41: 1380–1385. [PubMed]
Chauhan BC, LeVatte TL, Garnier KL, et al. Semiquantitative Optic Nerve Grading Scheme for Determining Axonal Loss in Experimental Optic Neuropathy. Invest Ophthalmol Vis Sci. 2006; 47: 634–640. [CrossRef] [PubMed]
Chauhan BC, Pan J, Archibald ML, LeVatte TL, Kelly MEM, Tremblay F. Effect of intraocular pressure on optic disc topography, electroretinography, and axonal loss in a chronic pressure-induced rat model of optic nerve damage. Invest Ophthalmol Vis Sci. 2002; 43: 2969–2976. [PubMed]
Ebneter A, Casson RJ, Wood JP, Chidlow G. Estimation of axon counts in a rat model of glaucoma: comparison of fixed-pattern sampling with targeted sampling. Clin Exp Ophthalmol. 2012; 40: 626–633. [CrossRef] [PubMed]
Levkovitch-Verbin H, Quigley HA, Martin KRG, Valenta D, Baumrind LA, Pease ME. Translimbal laser photocoagulation to the trabecular meshwork as a model of glaucoma in rats. Invest Ophthalmol Vis Sci. 2002; 43: 402–410. [PubMed]
Koschade SE, Koch MA, Braunger BM, Tamm ER. Efficient determination of axon number in the optic nerve: A stereological approach. Exp Eye Res. 2019; 186: 107710. [CrossRef] [PubMed]
Mayhew TM, Sharma AK. Sampling schemes for estimating nerve fibre size. II. Methods for unifascicular nerve trunks. J Anat. 1984; 139(Pt 1): 59–66. [PubMed]
Marina N, Bull ND, Martin KR. A semiautomated targeted sampling method to assess optic nerve axonal loss in a rat model of glaucoma. Nature Protocols. 2010; 5: 1642–1651. [CrossRef] [PubMed]
Kim CY, Rho S, Lee N, Lee CK, Sung Y. Semi-automated counting method of axons in transmission electron microscopic images. Vet Ophthalmol. 2016; 19: 29–37. [CrossRef] [PubMed]
Reynaud J, Cull G, Wang L, et al. Automated quantification of optic nerve axons in primate glaucomatous and normal eyes—method and comparison to semi-automated manual quantification. Invest Ophthalmol Vis Sci. 2012; 53: 2951–2959. [CrossRef] [PubMed]
Zarei K, Scheetz TE, Christopher M, et al. Automated axon counting in rodent optic nerve sections with AxonJ. Sci Rep. 2016; 6(1): 26559. [CrossRef] [PubMed]
Zaimi A, Duval T, Gasecka A, Côté D, Stikov N, Cohen-Adad J. AxonSeg: open source software for axon and myelin segmentation and morphometric analysis. Front Neuroinformat. 2016; 10: 37. [CrossRef]
Zaimi A, Wabartha M, Herman V, Antonsanti PL, Perone CS, Cohen-Adad J. AxonDeepSeg: Automatic axon and myelin segmentation from microscopy data using convolutional neural networks. Sci Rep. 2018; 8(1): 3816. [CrossRef] [PubMed]
Ritch MD, Hannon BG, Read AT, et al. AxoNet: A deep learning-based tool to count retinal ganglion cell axons. Sci Rep. 2020; 10(1): 8034. [CrossRef] [PubMed]
Deng W, Hedberg-Buenz A, Soukup DA, et al. AxonDeep: Automated optic nerve axon segmentation in mice with deep learning. Transl Vis Sci Technol. 2021; 10: 22. [CrossRef] [PubMed]
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. New York: Springer International Publishing; 2015: 234–241.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014 Sep 4.
Fukuda Y, Sugimoto T, Shirokawa T. Strain differences in quantitative analysis of the rat optic nerve. Exp Neurol. 1982; 75: 525–532. [CrossRef] [PubMed]
Yang H, Thompson H, Roberts MD, Sigal IA, Downs JC, Burgoyne CF. Deformation of the early glaucomatous monkey optic nerve head connective tissue after acute IOP elevation in 3-D histomorphometric reconstructions. Invest Ophthalmol Vis Sci. 2011; 52: 345–363. [CrossRef] [PubMed]
Mohan K, Kecova H, Hernandez-Merino E, Kardon RH, Harper MM. Retinal ganglion cell damage in an experimental rodent model of blast-mediated traumatic brain injury. Invest Ophthalmol Vis Sci. 2013; 54: 3440–3450. [CrossRef] [PubMed]
Boehme NA, Hedberg-Buenz A, Tatro N, et al. Axonopathy precedes cell death in ocular damage mediated by blast exposure. Sci Rep. 2021; 11(1): 11774. [CrossRef] [PubMed]
Hannon BG, Feola AJ, Gerberich BG, et al. Using retinal function to define ischemic exclusion criteria for animal models of glaucoma. Exp Eye Res. 2021; 202: 108354–108354. [CrossRef] [PubMed]
Hannon BG, Luna C, Feola AJ, et al. Assessment of visual and retinal function following in vivo Genipin-induced scleral crosslinking. Transl Vis Sci Technol. 2020; 9(10): 8. [CrossRef] [PubMed]
Campbell IC, Hannon BG, Read AT, Sherwood JM, Schwaner SA, Ethier CR. Quantification of the efficacy of collagen cross-linking agents to induce stiffening of rat sclera. J R Soc Interface. 2017; 14(129): 20170014. [CrossRef] [PubMed]
Gerberich BG, Hannon BG, Hejri A, et al. Transpupillary collagen photocrosslinking for targeted modulation of ocular biomechanics. Biomaterials. 2021; 271: 120735. [CrossRef] [PubMed]
Gerberich BG, Hannon BG, Brown DM, et al. Evaluation of spatially targeted scleral stiffening on neuroprotection in a rat model of glaucoma. Transl Vis Sci Technol. 2022; 11(5): 7. [CrossRef] [PubMed]
Quigley HA, Sanchez RM, Dunkelberger GR, L'Hernault NL, Baginski TA. Chronic glaucoma selectively damages large optic nerve fibers. Invest Ophthalmol Vis Sci. 1987; 28: 913–920. [PubMed]
Shamir RR, Duchin Y, Kim J, Sapiro G, Harel N. Continuous dice coefficient: A method for evaluating probabilistic segmentations. arXiv preprint arXiv:1906.11031. 2019 Jun 26.
Medeiros FA, Lisboa R, Weinreb RN, Liebmann JM, Girkin C, Zangwill LM. Retinal ganglion cell count estimates associated with early development of visual field defects in glaucoma. Ophthalmology. 2013; 120: 736–744. [CrossRef] [PubMed]
Guedes G, Tsai JC, Loewen NA. Glaucoma and aging. Curr Aging Sci. 2011; 4: 110–117. [CrossRef] [PubMed]
Calkins DJ, Horner PJ. The cell and molecular biology of glaucoma: Axonopathy and the brain. Invest Ophthalmol Vis Sci. 2012; 53: 2482–2484. [CrossRef] [PubMed]
Goyal V, Read AT, Brown DM, et al. Morphometric analysis of retinal ganglion cell axons in normal and glaucomatous brown norway rats optic nerves. Under Review, TVST.
Forrester J, Peters A. Nerve fibres in optic nerve of rat. Nature. 1967; 214(5085): 245–247. [CrossRef] [PubMed]
Laquis S, Chaudhary P, Sharma SC. The patterns of retinal ganglion cell death in hypertensive eyes. Brain Res. 1998; 784: 100–104. [CrossRef] [PubMed]
De Juan J, Iñiguez C, Number Carreres J., diameter and distribution of the rat optic nerve fibers. Cells Tissues Organs. 1978; 102: 294–299. [CrossRef]
Figure 1.
 
ON sampling and annotation. ON sub-images (B) were randomly selected from tiled micrographs of the entire ON cross-section (A). Axoplasm and myelin sheaths in each sub-image were manually annotated by two masked, independent annotators to produce a “ground truth” annotation (C), in which pink and yellow represent the manually annotated axoplasm and myelin sheath of the normal-appearing axons, respectively.
Figure 1.
 
ON sampling and annotation. ON sub-images (B) were randomly selected from tiled micrographs of the entire ON cross-section (A). Axoplasm and myelin sheaths in each sub-image were manually annotated by two masked, independent annotators to produce a “ground truth” annotation (C), in which pink and yellow represent the manually annotated axoplasm and myelin sheath of the normal-appearing axons, respectively.
Figure 2.
 
Pictorial representation of axoplasm and myelin sheath ground-truth probability map creation and structure of ground-truth (or, manually annotated) RGC axon probability map in PNG format. The grayscale bar displays the probability that a given pixel belongs to a given class (i.e., axoplasm or myelin sheath).
Figure 2.
 
Pictorial representation of axoplasm and myelin sheath ground-truth probability map creation and structure of ground-truth (or, manually annotated) RGC axon probability map in PNG format. The grayscale bar displays the probability that a given pixel belongs to a given class (i.e., axoplasm or myelin sheath).
Figure 3.
 
Overview of AxoNet 2.0 image processing workflow, including the deep learning model architecture, based on.20 Layer arrays and operations are shown by colored boxes and arrows. Overlapping consecutive sub-image from full ON images were fed into the deep learning model. The cropped model's output for each sub-image (part of image enclosed in the red dashed box in “model output for sub-image”) was stitched to obtain an axoplasm probability map for the entire ON, after which maps were post-processed. Morphometric properties of axons were computed from the post-processed output. Note that the color of the axoplasm in axons detected by the algorithm is arbitrary, and simply serves to aid in visualization of the output.
Figure 3.
 
Overview of AxoNet 2.0 image processing workflow, including the deep learning model architecture, based on.20 Layer arrays and operations are shown by colored boxes and arrows. Overlapping consecutive sub-image from full ON images were fed into the deep learning model. The cropped model's output for each sub-image (part of image enclosed in the red dashed box in “model output for sub-image”) was stitched to obtain an axoplasm probability map for the entire ON, after which maps were post-processed. Morphometric properties of axons were computed from the post-processed output. Note that the color of the axoplasm in axons detected by the algorithm is arbitrary, and simply serves to aid in visualization of the output.
Figure 4.
 
ON micrograph padding and consecutive sub-image cropping. (A) Zero-padding of ON micrograph around border to prevent cropping of edges (largest yellow box). (B) A zoom of the region enclosed in the red box in (A), displaying the 36 pixel overlap between the sub-image that will be fed to the deep learning model (blue box) and neighboring sub-images (dashed yellow boxes). The green arrow in (A) and (B) shows the direction in which the micrograph will be scanned. According to this scanning scheme, the next sub-image that will be provided to the deep learning model will be the dashed yellow box to the right of the blue box.
Figure 4.
 
ON micrograph padding and consecutive sub-image cropping. (A) Zero-padding of ON micrograph around border to prevent cropping of edges (largest yellow box). (B) A zoom of the region enclosed in the red box in (A), displaying the 36 pixel overlap between the sub-image that will be fed to the deep learning model (blue box) and neighboring sub-images (dashed yellow boxes). The green arrow in (A) and (B) shows the direction in which the micrograph will be scanned. According to this scanning scheme, the next sub-image that will be provided to the deep learning model will be the dashed yellow box to the right of the blue box.
Figure 5.
 
Example of deep learning model output. The axoplasm and myelin sheath probability map predicted by the AxoNet 2.0 software for the ON sub-image (A) is shown in (B) and (C), respectively. The grayscale bar displays pixel-wise axoplasm and myelin sheath probability.
Figure 5.
 
Example of deep learning model output. The axoplasm and myelin sheath probability map predicted by the AxoNet 2.0 software for the ON sub-image (A) is shown in (B) and (C), respectively. The grayscale bar displays pixel-wise axoplasm and myelin sheath probability.
Figure 6.
 
Examples of typical deep learning model segmentation results from rat optic nerves. Detected axoplasm color is arbitrary.
Figure 6.
 
Examples of typical deep learning model segmentation results from rat optic nerves. Detected axoplasm color is arbitrary.
Figure 7.
 
Comparison between automated and manual counts for the rat ON dataset, including training (A), validation (B), and testing (C) subsets. Note the closer overall agreement between automated counts and manual counts (higher R2 values) for AxoNet 2.0 (red) versus AxoNet 1.0 (black). Testing set root mean square error was 6.18 and 8.62 for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 7.
 
Comparison between automated and manual counts for the rat ON dataset, including training (A), validation (B), and testing (C) subsets. Note the closer overall agreement between automated counts and manual counts (higher R2 values) for AxoNet 2.0 (red) versus AxoNet 1.0 (black). Testing set root mean square error was 6.18 and 8.62 for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 8.
 
Bland-Altman plots showing agreement between automated counts and manual (ground-truth) axon counts for AxoNet 2.0 (A) and AxoNet 1.0 (B), showing that AxoNet 2.0 had less bias than AxoNet 1.0. The 95% limits of agreement, represented by the dashed lines, were (−6.84, 13.57) and (−7.4, 18.5) for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 8.
 
Bland-Altman plots showing agreement between automated counts and manual (ground-truth) axon counts for AxoNet 2.0 (A) and AxoNet 1.0 (B), showing that AxoNet 2.0 had less bias than AxoNet 1.0. The 95% limits of agreement, represented by the dashed lines, were (−6.84, 13.57) and (−7.4, 18.5) for AxoNet 2.0 and AxoNet 1.0, respectively.
Figure 9.
 
Visualization of AxoNet 2.0's segmentation performance on a non-human primate ON dataset. Panel A represents a fairly healthy nerve region whereas Panel B shows is a sub-image of a severely damaged ON. Detected axoplasm color is arbitrary.
Figure 9.
 
Visualization of AxoNet 2.0's segmentation performance on a non-human primate ON dataset. Panel A represents a fairly healthy nerve region whereas Panel B shows is a sub-image of a severely damaged ON. Detected axoplasm color is arbitrary.
Figure 10.
 
Visualization of AxoNet 2.0's segmentation performance on a mouse ON dataset. Panel A shows a fairly healthy nerve region whereas Panels B and C show slightly damaged and very damaged ONs, respectively. Images A and C were obtained from immersion-fixed globes while Panel B was obtained from a perfusion-fixed mouse. Detected axoplasm color is arbitrary.
Figure 10.
 
Visualization of AxoNet 2.0's segmentation performance on a mouse ON dataset. Panel A shows a fairly healthy nerve region whereas Panels B and C show slightly damaged and very damaged ONs, respectively. Images A and C were obtained from immersion-fixed globes while Panel B was obtained from a perfusion-fixed mouse. Detected axoplasm color is arbitrary.
Figure 11.
 
Comparison of RGC axon count (A) and axon density (B) between control and contralateral hypertensive eyes for 23 Brown-Norway rats. As expected, we observe a decrease in axon density in the hypertensive eye (P = 1.63 × 10−9). Outlined points represent outliers, defined as values ≥ 1.5 times the interquartile range above and below the third and first quartile, respectively.
Figure 11.
 
Comparison of RGC axon count (A) and axon density (B) between control and contralateral hypertensive eyes for 23 Brown-Norway rats. As expected, we observe a decrease in axon density in the hypertensive eye (P = 1.63 × 10−9). Outlined points represent outliers, defined as values ≥ 1.5 times the interquartile range above and below the third and first quartile, respectively.
Figure 12.
 
Normalized ratio of RGC axon density (hypertensive/control) vs. axoplasm diameter. Panel A shows data for Brown Norway rats (n = 23 pairs of eyes), whereas Panel B is a similar graph for non-human primates (n = 10 pairs of eyes; replot of data from Quigley et al.31). The curve is the mean ratio of axon number in glaucomatous nerves compared to their contralateral control eyes; dashed lines show ± 1 SD. Axons in a particular diameter bin are relatively more preserved than other axon sizes if the ratio is greater than 1. Preferential loss of large axons is seen. The largest bin contained all axons with diameter ≥ 1.85 µm.
Figure 12.
 
Normalized ratio of RGC axon density (hypertensive/control) vs. axoplasm diameter. Panel A shows data for Brown Norway rats (n = 23 pairs of eyes), whereas Panel B is a similar graph for non-human primates (n = 10 pairs of eyes; replot of data from Quigley et al.31). The curve is the mean ratio of axon number in glaucomatous nerves compared to their contralateral control eyes; dashed lines show ± 1 SD. Axons in a particular diameter bin are relatively more preserved than other axon sizes if the ratio is greater than 1. Preferential loss of large axons is seen. The largest bin contained all axons with diameter ≥ 1.85 µm.
Table.
 
Mean Soft-Dice Coefficients and 95% Confidence Intervals for Comparison Between Axonet2.0 Versus Human Annotations for the Training, Validation, and Testing Datasets (Rows 1-3) and Between Human Annotators (Last Row)
Table.
 
Mean Soft-Dice Coefficients and 95% Confidence Intervals for Comparison Between Axonet2.0 Versus Human Annotations for the Training, Validation, and Testing Datasets (Rows 1-3) and Between Human Annotators (Last Row)
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×