August 2022
Volume 11, Issue 8
Open Access
Artificial Intelligence  |   August 2022
Development of Cumulative Order-Preserving Image Transformation Based Variational Autoencoder for Anterior Segment Optical Coherence Tomography Images
Author Affiliations & Notes
  • Kilhwan Shon
    Department of Ophthalmology, Gangneung Asan Hospital, Gangneung, Korea
    Asan Artificial Intelligence Institute, Hwaseong-si, Gyeonggi-do, Korea
  • Kyung Rim Sung
    Department of Ophthalmology, College of Medicine, University of Ulsan, Asan Medical Center, Seoul, Korea
  • Jiehoon Kwak
    Department of Ophthalmology, College of Medicine, University of Ulsan, Asan Medical Center, Seoul, Korea
  • Joo Yeon Lee
    Camp 9 Orthopedic Clinic, Hwaseong-si, Gyeonggi-do, Korea
    Asan Artificial Intelligence Institute, Hwaseong-si, Gyeonggi-do, Korea
  • Joong Won Shin
    Department of Ophthalmology, College of Medicine, University of Ulsan, Asan Medical Center, Seoul, Korea
  • Correspondence: Kyung Rim Sung, Department of Ophthalmology, University of Ulsan, College of Medicine, Asan Medical Center, 388-1 Pungnap-2-dong, Songpa-gu, Seoul, Korea. e-mail: sungeye@gmail.com 
Translational Vision Science & Technology August 2022, Vol.11, 30. doi:https://doi.org/10.1167/tvst.11.8.30
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Kilhwan Shon, Kyung Rim Sung, Jiehoon Kwak, Joo Yeon Lee, Joong Won Shin; Development of Cumulative Order-Preserving Image Transformation Based Variational Autoencoder for Anterior Segment Optical Coherence Tomography Images. Trans. Vis. Sci. Tech. 2022;11(8):30. https://doi.org/10.1167/tvst.11.8.30.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To develop a variational autoencoder (VAE) suitable for analysis of the latent structure of anterior segment optical coherence tomography (AS-OCT) images and to investigate possibilities of latent structure analysis of the AS-OCT images.

Methods: We retrospectively collected clinical data and AS-OCT images from 2111 eyes of 1261 participants from the ongoing Asan Glaucoma Progression Study. A specifically modified VAE was used to extract six symmetrical and one asymmetrical latent variable. A total of 1692 eyes of 1007 patients were used for training the model. Conventional measurements and latent variables were compared between 74 primary angle closure (PAC) and 51 primary angle closure glaucoma (PACG) eyes from validation set (419 eyes of 254 patients) that were not used for training.

Results: Among the symmetrical latent variables, the first three and the last demonstrated easily recognized features, anterior chamber area in η1, curvature of the cornea in η2, the pupil size in η3 and corneal thickness in η6, whereas η4 and η5 were more complex aggregating complex interactions of multiple structures. Compared with PAC eyes, there was no difference in any of the conventional measurements in PACG eyes. However, values of η4 were significantly different between the two groups, being smaller in the PACG group (P = 0.015).

Conclusions: VAE is a useful framework for analysis of the latent structure of AS-OCT. Latent structure analysis could be useful in capturing features not readily evident with conventional measures.

Translational Relevance: This study suggested that a deep learning-based latent space model can be applied for the analysis of AS-OCT images to find latent characteristics of the anterior segment of the eye.

Introduction
Although advances have been made in imaging techniques of the anterior segment, such as anterior segment optical coherence tomography (AS-OCT), the appropriate analysis of acquired high-resolution images has been limited by the lack of proper analytical tools. Conventional methods consist of manual measurement of hand-crafted features by the physician. The commonly used parameters are as follows; anterior chamber depth (ACD), width, and area (ACA), lens vault, angle recess area, angle opening distance, trabecular-iris-space area, and iris thickness.1 Conventional analysis has been successfully utilized in a variety of tasks, including subclassification, monitoring intervention-induced changes, and describing dynamic and long-term processes in the anterior segment. Still, these parameters cannot inherently comprehend the whole image and have vulnerabilities when considering highly correlated parameters.27 
Recent advances in deep learning technology provide a new approach to image data analysis. Several studies have proven that deep learning techniques are not only suitable for the analysis of AS-OCT images but can achieve accuracies comparable to human measurements.812 However, because these studies usually involve large-scale automated measurements of manually assigned labels, they are affected by all the limitations of manual measurements. 
Latent space modeling is one of the machine learning approaches that could be applied to analyze high-dimensional data. An example of a latent space model (or latent variable model) used in the conventional context is factor analysis, such as the ones that have long been used in psychology.13 In the field of computer vision, a family of latent space models called deep generative models has been intensively developed to analyze various images.14 In our previous study, we have shown that a convolutional β-variational autoencoder (VAE) can be applied to the AS-OCT images to achieve a good disentangled latent space representation.15 Despite encouraging results, we also found shortcomings of a convolutional VAE framework, which motivated us to improve the model. The previous model had limited power in separating asymmetrical variance from symmetrical variance, which hampered the disentanglement of latent variables. 
To overcome problems arising from asymmetricity, we have developed a new method inspired by spatial transformer networks.16 In this new model, we have preserved the overall framework of VAE, but instead of convolution, an image warping technique we have named cumulative order-preserving image transforming network (COPIT) is used to reconstruct images from the latent space. COPIT was specifically developed and tailored for the current article's latent space representation of AS-OCT images. COPIT has been designed to have several properties: (1) the order of the x and y coordinates are preserved after transformation; (2) each latent variable defines a unique transformation; (3) multiple transformations can be combined into a single new transformation; and (4) each layer can be designed differently depending on the purposes. The entire network is based on the convolutional β-VAE with two modifications: (1) the convolutional decoder part is replaced by COPIT, and (2) the loss function has been modified to include a cosine similarity. 
Methods
Participants
We selected participants from the ongoing Asan Glaucoma Progression Study who have undergone an AS-OCT examination (Visante OCT, ver. 3.0; Carl Zeiss Meditec, Jena, Germany). From 2111 eyes of 1261 patients, we randomly assigned 80% of the patients to the training set (1692 eyes of 1007 patients) and the remaining 20% to the validation set (419 eyes of 254 patients), ensuring both eyes of the patient were assigned to the same group. Comparisons between conventional measurements and latent variables were made using patients from the validation set. A more detailed description of the population, including the clinical assessment, inclusion criteria, image acquisition, and demographics, has been included in our previous study.15 
All procedures conformed to the Declaration of Helsinki, and this study was approved by the institutional review board of the Asan Medical Center, University of Ulsan, Seoul, Korea. 
Image Preparation and Segmentation
Raw AS-OCT images of size 1200 × 1500 (H × W) pixels were center cropped to create a grayscale 512 × 1024 image, resized to 256 × 512 pixels. The segmentation was done in four steps: (1) resized images were manually segmented into three segments: the iris, the corneoscleral shell, and the anterior chamber by an experienced glaucoma specialist (KS); (2) 130 manually segmented images were used to train a modified U-net; (3) a trained modified U-net was used to segment remaining images; and (4) segmented images were aligned with rotation and translation using a spatial-transformer network.16 A modified U-net is structurally identical to the original U-net but has been reduced in depth and has been adjusted to a different resolution.15,17 Segmented images were cropped to a size of 192 × 448, and left eyes were horizontally flipped. 
Model Structure
Our model follows the general structure of VAE, but unlike the typical convolutional autoencoder, a decoder part was replaced with an image warping technique that was inspired by a spatial transformer network.16 Compared to the original spatial transformer network, our model has the following major structural differences: (1) added a “reparameterization” part; (2) replaced a “grid generator” with our new COPIT; (3) replaced the sampling technique with a linearized multi-sampling technique proposed by Jiang et al.;18 and (4) have modified the loss function.16,18,19 
The encoder (which corresponds to a “localization net” in the spatial transformer network) is identical to the model described in our previous study but was trained de novo.15 A reparameterization part and addition of Kullback-Leibler divergence (KLD) to the loss function were taken from makes our model a VAE.19 The decoder part in our preceding work or a “grid generator” in the spatial transformer network has been replaced with a COPIT-based decoder specifically developed for the current research. In the COPIT, numbers are generated from reparametrized latent variables using fully connected layers, which are then fed into COPIT to generate a sampling grid. Then a transformed image is calculated from the standard image using a sampling grid (Fig. 1). 
Figure 1.
 
Schematic diagram of convolutional VAE, spatial transformer and COPIT based VAE. In a convolutional VAE, convolutional encoder is used to generate means and standard deviations , from which latent variables are sampled, and then output image is generated from sampled latent variables through convolutional decoder (A). In a spatial transformer, convolutional localization net is used to generate latent variables which are used as transformation coefficients for generating sampling grids, and then output image is calculated using a sampling grid and the standard image (B). In our new model, latent variables are generated in the same way as in the convolutional VAE, but sampled in similar way as in the spatial transformer but grid generator has been replaced with COPIT which two versions - six symmetrical and one asymmetrical. Output image is calculated from final sampling grid and standard image using linearized multi-sampling (C).
Figure 1.
 
Schematic diagram of convolutional VAE, spatial transformer and COPIT based VAE. In a convolutional VAE, convolutional encoder is used to generate means and standard deviations , from which latent variables are sampled, and then output image is generated from sampled latent variables through convolutional decoder (A). In a spatial transformer, convolutional localization net is used to generate latent variables which are used as transformation coefficients for generating sampling grids, and then output image is calculated using a sampling grid and the standard image (B). In our new model, latent variables are generated in the same way as in the convolutional VAE, but sampled in similar way as in the spatial transformer but grid generator has been replaced with COPIT which two versions - six symmetrical and one asymmetrical. Output image is calculated from final sampling grid and standard image using linearized multi-sampling (C).
To separate symmetrical variability from asymmetrical variability, we used two versions of COPIT layers: an asymmetrical layer and a symmetrical layer (the left and right sides of the grid are mirror images reflected over the y axis). All variationally inferred latent variables have been matched to the symmetrical layers, while one additional variable was matched with an asymmetrical layer. Also, because generating grids with a number of points equivalent to or larger than the number of pixels, in our case 192 × 448 = 86,016, is not only inefficient but might also cause instability, we have generated down-scaled grids from fully connected layers that were upsampled using bicubic interpolation. A detailed description of COPIT can be found in the supplementary material
The Loss Function
A latent space z in the VAE is variationally inferred such that a posterior pθ(z|x) is approximated to a prior pθ(z), which is usually defined as a Gaussian distribution N(0, I).20 The distance between a posterior and a prior is measured with a KLD. However, KLD does not provide information regarding the similarity between latent variables. Hence, we have decided to add cosine similarities between all possible combinations of latent variables:  
\begin{eqnarray} Sim\left( z \right) = \mathop \sum \limits_{i = 1}^{n - 1} \mathop \sum \limits_{j = i + 1}^n \frac{{\parallel {z_i} \cdot {z_j}\parallel }}{{\parallel {z_i}\parallel \parallel {z_j}\parallel }} \end{eqnarray}
(1)
 
In a special case where the data is centered at zero, cosine similarity is equivalent to Pearson's correlation coefficient.21 
With additional hyperparameter γ and the similarity function Sim, our modified loss function VAE is given as:  
\begin{eqnarray}L\left( {\theta ,\phi ,\beta ;x} \right) &=& - {\mathbb{E}_{q\phi \left( {z|x} \right)}}\left[ {{\rm{log\ }}{p_\theta }\left( {x|z} \right)} \right]\nonumber\\ && + \beta {D_{KL}}({q_\phi }\left( {z|x} \right)||p\left( z \right)) + \gamma Sim\left( z \right)\nonumber\\\end{eqnarray}
(2)
where x is an image in our case, qϕ(z|x) is an estimated distribution of the latent space, pθ(x|z) is the likelihood of generating a true image, and DKL is a KLD.22 
Adjusting Hyperparameters and Training
The number of symmetrical latent variables was set to 6 based on our previous work with an additional asymmetrical layer, whereas the values of β and γ were both set at 52, which yielded a comparable KLD to the convolutional VAE model presented in our previous work.15 The grids were scaled down by a factor of 16, 16, 16, 8, 4, 2 for symmetrical layers 1 to 6 and by a factor of 8 for the asymmetrical layer. For the calculation of the reconstruction loss, mean squared error has resulted in a shorter training time than binary cross-entropy. The layers were trained in three steps: (1) six symmetrical layers were trained sequentially (only one layer was trained at a time while the other layers were frozen) for 900 epochs, (2) the asymmetrical layer was trained for 50 epochs, and (3) all layers were simultaneously trained for 50 epochs. 
Making Conventional Measurements and Calculating Latent Variables in Selected Eyes
Exclusively from patients in the validation set, we collected complete clinical information of patients diagnosed with primary angle closure (PAC) or primary angle closure glaucoma (PACG). All patients have undergone static and dynamic gonioscopy with Sussman 4-mirror gonioscope (Ocular Instruments, Bellevue, WA, USA) in a darkened room (0.5 cd/m2) by an experienced glaucoma specialist (K.R.S). PAC was diagnosed if the eyes had an occludable angle (pigmented posterior trabecular meshwork was not visible on nonindentation gonioscopy for at least 180° in the primary position) with signs indicating trabecular obstruction (elevated intraocular pressure, distortion of the radially orientated iris fibers, “glaukomflecken” lens opacities, excessive pigment deposition on the trabecular meshwork, or presence of peripheral anterior synechiae) but without any sign suggestive of glaucoma on optic disc examination and visual field tests. PAC eyes showing glaucomatous optic disc changes (neuroretinal rim thinning, disc excavation, or optic disc hemorrhage attributable to glaucoma) or a glaucomatous visual field change were classified as PACG. Eyes with a previous history suggesting acute angle closure attack have been excluded: (1) presenting with ocular or periocular pain, nausea or vomiting, or intermittent blurred vision with haloes; (2) those with a presenting intraocular pressure of more than 30 mm Hg; and (3) those who had experienced at least three of the following: conjunctival injections, corneal epithelial edema, mid-dilated unreactive pupil. If both eyes were eligible, we selected the right eye. As a result, 125 eyes of 125 patients, including 74 PAC eyes and 51 PACG eyes, were analyzed. 
A single investigator (S.K), blinded to all information, assigned the scleral spur—defined as the point showing a change in the curvature of the inner surface of the angled wall - and measured the ACD using calipers built-in in the software provided by the manufacturers.23 Then, the software provided by the manufacturer automatically measured the scleral spur angle, angle opening distance at 500 µm and 750 µm, angle recess area at 500 µm and 750 µm, and the trabecular-iris space area at 500 µm and 750 µm. Additionally, the iris thickness at 750 µm from the scleral spur, iris curvature, ACD, anterior chamber width, ACA, and lens vault were measured using Fiji software, and pixel values were converted into real-world units by comparing pupil diameter measured with Fiji software and built-in and calipers.24 More detailed descriptions of the measurement methods can be found in our previous works.2527 
Clinical information, including age, gender, axial length, baseline intraocular pressure, manual measurements, and values of latent variables derived from the neural network trained in previous steps were compared between the PAC and PACG eyes with Student's t test for continuous variables and the χ2 test for frequency variables using SAS 9.4 software (SAS Institute Inc., Cary, NC, USA). We have not collected detailed information from the training dataset due to the larger size of the dataset, but we assume the proportion of PAC/PACG eyes in the training dataset to be not statistically different because patients were randomly assigned training and validation datasets and demographics do not differ.15 
Results
Exploration of the Latent Space
A satisfactory latent space disentanglement was achieved in our new method, such that the variables were discernible and appreciable in the visual analysis. Specifically, η1 seems to represent an overall ACD and ACA, whereas η2 seems to mainly represent the curvature of the cornea. The η3 was associated with pupil size changes without any noticeable changes in the corneoscleral or lens contour. The η4 was also related to the pupil size, but in contrast to η3, the following differences are worth mentioning: (1) the iris is thicker and more curved with a small pupil size (negative z value); (2) the iris became relatively flat when the pupil got larger (positive z value); and (3) there was an overt increase in the lens vault with a positive z value. Little perceivable changes were seen for the η5 variable, but by creating a subtraction image with extreme z values, we can notice some interesting characteristics: (1) the iris is flatter at negative z values and more curved at positive z values; (2) at negative z values, the iris becomes thinner, but the peripheral part remains thick; (3) the anterior chamber gets narrower at negative z values and wider at positive z values; (4) the periphery of the anterior chamber becomes shallower at negative z values and deeper at positive z values; and (5) there is a subtle change in the corneal profile such that at negative z values, the central portion of the cornea is steeper whereas the peripheral portion of the cornea is flatter and thicker. The η6 seems to be mainly related to the corneal thickness, whereas ηA appears to represent the asymmetricity as intended (Fig. 2 and Fig. 3). 
Figure 2.
 
Visualization of the latent variables of the model. There are six symmetrical layers denoted \(\eta_1 \hbox{ to }\eta_6\) and one asymmetric layer denoted \(\eta_A\). For asymmetric layers, z values of −2.0, −1.0, 0, 1.0, and 2.0 were used while for the asymmetric layer, z values have been reduced to −1.0, −0.5, 0, 0.5, and 1.0 for better interpretability.
Figure 2.
 
Visualization of the latent variables of the model. There are six symmetrical layers denoted \(\eta_1 \hbox{ to }\eta_6\) and one asymmetric layer denoted \(\eta_A\). For asymmetric layers, z values of −2.0, −1.0, 0, 1.0, and 2.0 were used while for the asymmetric layer, z values have been reduced to −1.0, −0.5, 0, 0.5, and 1.0 for better interpretability.
Figure 3.
 
Subtraction image of \(\eta_5\) for z values −4 and 4. Areas that correspond to \(z = 4\) are color-coded in yellow tone while areas that correspond are color-coded in blue tone. Gray color represents an area of the cornea which is common to both z values and green color represents areas of iris common to both z values.
Figure 3.
 
Subtraction image of \(\eta_5\) for z values −4 and 4. Areas that correspond to \(z = 4\) are color-coded in yellow tone while areas that correspond are color-coded in blue tone. Gray color represents an area of the cornea which is common to both z values and green color represents areas of iris common to both z values.
Conventional Measurements and Latent Variables for PAC and PACG Eyes
There was no statistically significant difference between PAC and PACG eyes for any of the conventional measurements, with angle recess area having the lowest P value (P = 0.116). Among the latent variables, PACG eyes have a smaller value of η4 compared to PAC eyes (P = 0.015; Table). Values of the latent variables can be visualized with the network in several ways. When using mean values of η1 to η6 for PAC and PACG eyes, there is little difference in the cornea or lens between the groups. The size of the pupil and width of the anterior chamber was slightly smaller in the PACG group compared to the PAC group, which is consistent with the conventional measurements in the Table (the top row of Fig. 4). To enhance the difference, we upscaled the mean values of the latent variables for each group by a factor of three. The angle difference was more noticeable with the narrower angle in the PACG group compared to that of the PAC group (middle row of Fig. 4), although there was little difference for the cornea or lens. Because we can select certain latent variables, we have created a subtraction image using the mean values of η4 for the PAC and PACG eyes multiplied by 3, whereas all other latent variables are kept constant at zero. Still, there is little difference in the angle, corneal contour, or anterior chamber width, although the lens vault is slightly larger in the PAC group (bottom row of Fig. 4). 
Table.
 
Conventional Measurements and Latent Variables for PAC and PACG Eyes
Table.
 
Conventional Measurements and Latent Variables for PAC and PACG Eyes
Figure 4.
 
Reconstructed images representing PAC and PACG eyes from selected mean values of latent variables of each group. Top row: mean values of \(\eta_1 \sim \eta_6\), middle row: tripled mean values of \(\eta_1 \sim \eta_6\), bottom row: tripled mean values of \(\eta_4\). Left column: plain segmented images, middle column: difference map, right column: magnified difference map of the angle with approximate AOD500 marked with arrows. We can see that PACG eyes seem to have a narrower angle (top row, right column) which is more pronounced if we multiply latent variables by 3 (middle row, right column). However, there was no noticeable difference in the width of the angle regarding only \(\eta_4\) (bottom row, right column). It implies that despite significant statistical difference between two disease groups in \(\eta_4\), narrower angle in PACG is not a direct result of \(\eta_4\) but rather a result of combination all latent variables.
Figure 4.
 
Reconstructed images representing PAC and PACG eyes from selected mean values of latent variables of each group. Top row: mean values of \(\eta_1 \sim \eta_6\), middle row: tripled mean values of \(\eta_1 \sim \eta_6\), bottom row: tripled mean values of \(\eta_4\). Left column: plain segmented images, middle column: difference map, right column: magnified difference map of the angle with approximate AOD500 marked with arrows. We can see that PACG eyes seem to have a narrower angle (top row, right column) which is more pronounced if we multiply latent variables by 3 (middle row, right column). However, there was no noticeable difference in the width of the angle regarding only \(\eta_4\) (bottom row, right column). It implies that despite significant statistical difference between two disease groups in \(\eta_4\), narrower angle in PACG is not a direct result of \(\eta_4\) but rather a result of combination all latent variables.
Discussion
Our model has successfully disentangled latent space with readily distinguishable main features for the first three latent variables: anterior chamber area for η1, the curvature of the cornea for η2, and pupil size for η3. η4 and η5 are more complex, with η4 associated with at least three features: (1) pupil size, (2) curvature and thickness of the iris, and (3) the lens vault. The η5 is the most complex, with combined changes of the iris profile, corneal profile, width of the anterior chamber, and depth of the peripheral anterior chamber. η6 seems to be associated with corneal thickness (Figs. 2 and 3). Given that our model is unsupervised, good interpretability of certain latent variables (η1, η2, η3, and η6) is encouraging. However, some latent variables (η4, η5) are difficult to interpret, which implies complex interactions between AS-OCT features but also leaves room for further improvement of the model. 
Comparing conventional measurements and latent variables of PAC and PACG eyes led to interesting results: there was no statistically significant difference in any of the conventional measurements, but the PACG eyes had smaller values of η4 compared to PAC eyes (P = 0.015; Table). On the visualization of latent variables of the two groups, we noticed that PACG eyes seemed to have a narrower angle, which was more pronounced if the latent variables were multiplied by a factor of 3. However, despite the statistically significant difference of η4 between disease groups, there was no noticeable difference in the width of the angle between groups if only η4 was visualized (Fig. 4). Hence, if there is a difference in the angle between PAC and PACG eyes, it is not a direct result of η4 but rather due to the combined action of all latent variables. Although current research is not sufficient to draw any conclusion regarding morphological differences between PAC and PACG eyes, such results are encouraging in that current research did not apply any disease labels during the training. Although we cannot derive any conclusion regarding the association between latent variables and disease mechanisms, we can postulate that there could be some complex interactions between conventional measurements that are not evident with conventional measurements but could be captured with deep learning techniques. 
Compared to the convolutional VAE model we have presented in a previous study, our new COPIT-based model has shown similar but less blurred reconstructed images but vastly improved latent space disentanglement such that (1) every latent variable represents a unique feature, (2) features are easier to comprehend, and (3) other components of corneosclera, iris, and the lens remain relatively stable whereas the feature of the focus changes dramatically (Figs. 2 and 3). Such encouraging results could be achieved by implementing a design specifically tailored for the application. First, we have implemented sequential training, which provides some characteristics: (1) the latent variables are ordered such that the variance explained can be expected to be the largest for the first latent variable and decrease afterward, (2) the addition of cosine similarity to the loss function encourages latent factors are dissimilar, and (3) resulting latent space is easier to interpret to human eyes. Second, in our new model, every layer could be configured to have a different design. For example, we have separated the asymmetrical layer from the symmetrical layers, which can offer a clear advantage given that physiologically, no eye is symmetrical. Hence, we expect that separating the asymmetricity will help reduce confounding and enhance the extraction of clinically meaningful features in the symmetrical layers. Besides horizontal symmetricity, other restrictions and transformations such as affine transformation or thin-plate spline transformation are relatively easy to incorporate and can be applied on a per-layer basis if required. 
Another useful characteristic of our new model compared to the convolutional VAE is better stability at extreme z values. This can be useful because scaling up the values of the latent variables can enhance the subtle changes to improve interpretability. However, after a certain point, the generated image becomes unnatural, limiting the range of usable z values. For example, at z values of −4 and 4, which have been used in Figure 3, latent variables in the convolutional VAE generate somewhat broken images (Fig. 5). 
Figure 5.
 
Latent variables of convolutional VAE presented in our previous paper at more extreme z values.
Figure 5.
 
Latent variables of convolutional VAE presented in our previous paper at more extreme z values.
We believe that in the near future, deep learning techniques will be more commonly applied in the field of glaucoma, including AS-OCT image analysis. Many deep learning techniques involve the dimension reduction stage, which is related to the latent space. For example, deep clustering or longitudinal analyses can use latent space. For such strategies to be more effective, an understanding of latent structure is essential. We hope our study could promote understanding of the latent structure of AS-OCT images and provide a basis for future deep learning studies. 
AS-OCT poses an important limitation to our analysis. Inadequate tissue resolution makes it difficult to delineate the exact border between the cornea and the iris and the location of the scleral spur in certain eyes with a closed angle. AS-OCT has limited penetration, limiting visualization of the posterior surface of the iris and the ciliary body, whereas in some eyes without cataracts, the lens is so transparent that some parts of the anterior capsule are not visualized. We expect those limitations to be overcome with newer technologies. The model also has limitations inherent in generative models, including a requirement for manual tailoring of the hyperparameters. In the process of tailoring, knowledge of developers about the subject, AS-OCT images, in our case, get involved. Because there is no universally accepted feature to qualitatively assess the degree of disentanglement for AS-OCT images, the process is highly subjective. The same limitation applies when interpreting the results, especially latent variables that capture complex interactions of various features that are difficult to interpret. 
Both training and validation data were derived from the same dataset, which does not include various ethnic groups, and for a machine learning study, the sample size is small. Hence, we expect the generalizability of our analysis to be limited, and the results should be assumed to be dependent on a specific dataset we have used. Comparisons between PAC and PACG eyes have all the limitations inherent in the retrospective design of the study. 
Nonetheless, we have shown that an unsupervised neural network can achieve good results for the analysis of the latent structure of the AS-OCT. Also, our results suggest that latent space analysis can be useful for capturing a combination of features not readily represented with conventional measurements due to their complex interactions. 
Acknowledgments
Supported by a grant (2018-0500) from the Asan Institute for Life Sciences, Asan Medical Center, Seoul, South Korea. 
Disclosure: K. Shon, (N); K.R. Sung, (N); J. Kwak, (N); J.Y. Lee, (N); J.W. Shin, (N) 
References
Triolo G, Barboni P, Savini G, et al. The use of anterior-segment optical-coherence tomography for the assessment of the iridocorneal angle and its alterations: update and current evidence. J Clin Med. 2021; 10: 231. [CrossRef]
Baek S, Sung KR, Sun JH, et al. A hierarchical cluster analysis of primary angle closure classification using anterior segment optical coherence tomography parameters. Invest Ophthalmol Vis Sci. 2013; 54: 848–853. [CrossRef] [PubMed]
Kwon J, Sung KR, Han S, et al. Subclassification of primary angle closure using anterior segment optical coherence tomography and ultrasound biomicroscopic parameters. Ophthalmology. 2017; 124: 1039–1047. [CrossRef] [PubMed]
Han S, Sung KR, Lee KS, et al. Outcomes of laser peripheral iridotomy in angle closure subgroups according to anterior segment optical coherence tomography parameters. Invest Ophthalmol Vis Sci. 2014; 55: 6795–6801. [CrossRef] [PubMed]
Lee Y, Sung KR, Na JH, et al. Dynamic changes in anterior segment (AS) parameters in eyes with primary angle closure (PAC) and PAC glaucoma and open-angle eyes assessed using as optical coherence tomography. Invest Ophthalmol Vis Sci. 2012; 53: 693–697. [CrossRef] [PubMed]
Kwon J, Sung KR, Han S. Long-term changes in anterior segment characteristics of eyes with different primary angle-closure mechanisms. Am J Ophthalmol. 2018; 191: 54–63. [CrossRef] [PubMed]
Moghimi S, Torkashvand A, Mohammadi M, et al. Classification of primary angle closure spectrum with hierarchical cluster analysis. PLoS One. 2018; 13(7): e0199157. [CrossRef] [PubMed]
Xu BY, Chiang M, Pardeshi AA, et al. Deep neural network for scleral spur detection in anterior segment OCT images: the Chinese American eye study. Transl Vis Sci Technol. 2020; 9(2): 1–10. [CrossRef]
Wanichwecharungruang B, Kaothanthong N, Pattanapongpaiboon W, et al. Deep learning for anterior segment optical coherence tomography to predict the presence of plateau iris. Transl Vis Sci Technol. 2021; 10(1): 1–10. [CrossRef]
Fu H, Baskaran M, Xu Y, et al. A deep learning system for automated angle-closure detection in anterior segment optical coherence tomography images. Am J Ophthalmol. 2019; 203: 37–45. [CrossRef] [PubMed]
Hao H, Zhao Y, Yan Q, et al. Angle-closure assessment in anterior segment OCT images via deep learning. Med Image Anal. 2021; 69: 101956. [CrossRef] [PubMed]
Pham TH, Devalla SK, Ang A, et al. Deep learning algorithms to isolate and quantify the structures of the anterior segment in optical coherence tomography images. Br J Ophthalmol. 2021; 105: 1231–1237. [CrossRef] [PubMed]
Bollen KA. Latent variable in psychology and social sciences. Ann Rev Psychol. 2002; 53: 605–634. [CrossRef]
Turhan CG, Bilge HS. Recent trends in deep generative models: a review. In: 2018 3rd International Conference on Computer Science and Engineering (UBMK). IEEE; 2018: 574–579.
Shon K, Sung KR, Kwak J, et al. Development of a β-variational autoencoder for disentangled latent space representation of anterior segment optical coherence tomography images. Transl Vis Sci Technol. 2022; 11(2): 11. [CrossRef] [PubMed]
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. Adv Neural Inf Process Syst. 2015; 2015: 2017–2025.
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. IEEE Access. 2015; 9: 16591–16603.
Jiang W, Sun W, Tagliasacchi A, et al. Linearized multi-sampling for differentiable image transformation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 2988–2997.
Kingma DP, Welling M. Auto-encoding variational bayes. 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings. 2014;(Ml): 1–14.
Kingma DP, Welling M. Auto-encoding variational Bayes. 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings. 2014: 1–14.
van Dongen S, Enright AJ. Metric distances derived from cosine similarity and Pearson and Spearman correlations. arXiv preprint arXiv:1208.3145. Accessed April 4, 2012.
Higgins I, Matthey L, Pal A, et al. Beta-VAE: learning basic visual concepts with a constrained variational framework. Available at: https://openreview.net/forum?id=Sy2fzU9gl.
Sakata LM, Lavanya R, Friedman DS, et al. Assessment of the scleral spur in anterior segment optical coherence tomography images. Arch Ophthalmol. 2008; 126: 181–185. [CrossRef] [PubMed]
Schindelin J, Arganda-Carreras I, Frise E, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012; 9: 676–682. [CrossRef] [PubMed]
Lee KS, Sung KR, Kang SY, et al. Residual anterior chamber angle closure in narrow-angle eyes following laser peripheral iridotomy: anterior segment optical coherence tomography quantitative study. Jpn J Ophthalmol. 2011; 55: 213–219. [CrossRef] [PubMed]
Lee Y, Sung KR, Na JH, et al. Dynamic changes in anterior segment (AS) parameters in eyes with primary angle closure (PAC) and PAC glaucoma and open-angle eyes assessed using as optical coherence tomography. Invest Ophthalmol Vis Sci. 2012; 53: 693–697. [CrossRef] [PubMed]
Bookstein FL. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations. IEEE Trans Pattern Anal Mach Intell. 1989; 11: 567–585. [CrossRef]
Figure 1.
 
Schematic diagram of convolutional VAE, spatial transformer and COPIT based VAE. In a convolutional VAE, convolutional encoder is used to generate means and standard deviations , from which latent variables are sampled, and then output image is generated from sampled latent variables through convolutional decoder (A). In a spatial transformer, convolutional localization net is used to generate latent variables which are used as transformation coefficients for generating sampling grids, and then output image is calculated using a sampling grid and the standard image (B). In our new model, latent variables are generated in the same way as in the convolutional VAE, but sampled in similar way as in the spatial transformer but grid generator has been replaced with COPIT which two versions - six symmetrical and one asymmetrical. Output image is calculated from final sampling grid and standard image using linearized multi-sampling (C).
Figure 1.
 
Schematic diagram of convolutional VAE, spatial transformer and COPIT based VAE. In a convolutional VAE, convolutional encoder is used to generate means and standard deviations , from which latent variables are sampled, and then output image is generated from sampled latent variables through convolutional decoder (A). In a spatial transformer, convolutional localization net is used to generate latent variables which are used as transformation coefficients for generating sampling grids, and then output image is calculated using a sampling grid and the standard image (B). In our new model, latent variables are generated in the same way as in the convolutional VAE, but sampled in similar way as in the spatial transformer but grid generator has been replaced with COPIT which two versions - six symmetrical and one asymmetrical. Output image is calculated from final sampling grid and standard image using linearized multi-sampling (C).
Figure 2.
 
Visualization of the latent variables of the model. There are six symmetrical layers denoted \(\eta_1 \hbox{ to }\eta_6\) and one asymmetric layer denoted \(\eta_A\). For asymmetric layers, z values of −2.0, −1.0, 0, 1.0, and 2.0 were used while for the asymmetric layer, z values have been reduced to −1.0, −0.5, 0, 0.5, and 1.0 for better interpretability.
Figure 2.
 
Visualization of the latent variables of the model. There are six symmetrical layers denoted \(\eta_1 \hbox{ to }\eta_6\) and one asymmetric layer denoted \(\eta_A\). For asymmetric layers, z values of −2.0, −1.0, 0, 1.0, and 2.0 were used while for the asymmetric layer, z values have been reduced to −1.0, −0.5, 0, 0.5, and 1.0 for better interpretability.
Figure 3.
 
Subtraction image of \(\eta_5\) for z values −4 and 4. Areas that correspond to \(z = 4\) are color-coded in yellow tone while areas that correspond are color-coded in blue tone. Gray color represents an area of the cornea which is common to both z values and green color represents areas of iris common to both z values.
Figure 3.
 
Subtraction image of \(\eta_5\) for z values −4 and 4. Areas that correspond to \(z = 4\) are color-coded in yellow tone while areas that correspond are color-coded in blue tone. Gray color represents an area of the cornea which is common to both z values and green color represents areas of iris common to both z values.
Figure 4.
 
Reconstructed images representing PAC and PACG eyes from selected mean values of latent variables of each group. Top row: mean values of \(\eta_1 \sim \eta_6\), middle row: tripled mean values of \(\eta_1 \sim \eta_6\), bottom row: tripled mean values of \(\eta_4\). Left column: plain segmented images, middle column: difference map, right column: magnified difference map of the angle with approximate AOD500 marked with arrows. We can see that PACG eyes seem to have a narrower angle (top row, right column) which is more pronounced if we multiply latent variables by 3 (middle row, right column). However, there was no noticeable difference in the width of the angle regarding only \(\eta_4\) (bottom row, right column). It implies that despite significant statistical difference between two disease groups in \(\eta_4\), narrower angle in PACG is not a direct result of \(\eta_4\) but rather a result of combination all latent variables.
Figure 4.
 
Reconstructed images representing PAC and PACG eyes from selected mean values of latent variables of each group. Top row: mean values of \(\eta_1 \sim \eta_6\), middle row: tripled mean values of \(\eta_1 \sim \eta_6\), bottom row: tripled mean values of \(\eta_4\). Left column: plain segmented images, middle column: difference map, right column: magnified difference map of the angle with approximate AOD500 marked with arrows. We can see that PACG eyes seem to have a narrower angle (top row, right column) which is more pronounced if we multiply latent variables by 3 (middle row, right column). However, there was no noticeable difference in the width of the angle regarding only \(\eta_4\) (bottom row, right column). It implies that despite significant statistical difference between two disease groups in \(\eta_4\), narrower angle in PACG is not a direct result of \(\eta_4\) but rather a result of combination all latent variables.
Figure 5.
 
Latent variables of convolutional VAE presented in our previous paper at more extreme z values.
Figure 5.
 
Latent variables of convolutional VAE presented in our previous paper at more extreme z values.
Table.
 
Conventional Measurements and Latent Variables for PAC and PACG Eyes
Table.
 
Conventional Measurements and Latent Variables for PAC and PACG Eyes
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×