Abstract
Purpose:
Clinical evaluation of eye versions plays an important role in the diagnosis of special strabismus. Despite the importance of versions, they are not standardized in clinical practice because they are subjective. Assuming that objectivity confers accuracy, this research aims to create an artificial intelligence app that can classify the eye versions into nine positions of gaze.
Methods:
We analyzed photos of 110 strabismus patients from an outpatient clinic of a tertiary hospital at nine gazes. For each photo, the gaze was identified, and the corresponding version was rated by the same examiner during patient evaluation.
Results:
The images were standardized by using the OpenCV library in Python language, so that the patient's eyes were located and sent to a multilabel model through the Keras framework regardless of the photo orientation. Then, the model was trained for each combination of the following groupings: eyes (left, right), gaze (1 to 9), and version (−4 to 4). Resnet50 was used as the neural network architecture, and the Data Augmentation technique was applied. For quick inference via web browser, the SteamLit app framework was employed. For use in Mobiles, the finished model was exported for use in through the Tensorflow Lite converter.
Conclusions:
The results showed that the mobile app might be applied to complement evaluation of ocular motility based on objective classification of ocular versions. However, further exploratory research and validations are required.
Translational Relevance:
Apart from the traditional clinical practice method, professionals will be able to envisage an easy-to-apply support app, to increase diagnostic accuracy.
The study was approved by the Research Ethics Committee of Hospital das Clínicas of the University of Sao Paulo. In total 323 patients with strabismus were invited to participate in the study. They had been followed up in a specialized outpatient clinic of a university hospital from 2015 to 2019. Patients who underwent orbit decompression or who had any facial deformity that prevented identification of facial points by the Dlib library were excluded. Patients with corneal disease such as microcornea and leukoma or patients for whom identifying the sclera corneal limbus region was difficult were also excluded. Other exclusion criteria included previous strabismus surgery or version classification less than or equal to −5 and greater than or equal to 5. The resulting sample had 110 participants, and their characteristics are shown in
Table 1.
Table 1. Characteristics of the Study Participants
Table 1. Characteristics of the Study Participants
The photographic images of the nine positions of gaze were obtained with the examiner standing in front of the participant, at 1 m, with a 16.1-million-megapixel digital camera (COOLPIX S8200; Nikon Inc., Tokyo, Japan) and ISO automatic gain (100–1600). The participant's head was positioned so that the head remained stable, and the patient looked at a fixation target corresponding to the position of gaze. All the images had resolution of all images was 4608 × 3456 pixels. This technique for obtaining and evaluating gaze versions by photography was validated by Lim et al.
17
The patient was instructed to follow an object presented by the examiner, from the primary position to the secondary and tertiary positions of gaze. Each patient was evaluated twice by different evaluators present during the service. The evaluators were ophthalmologists who had been specialists in the area of strabismus for over a decade. In case of divergent evaluation, the Department Head, who had more than three decades of experience in the area, was consulted. For each muscle involved, versions were graded from −1 to −4 for hypofunction and from 1 to 4 for hyperfunction. The images of all the participants were classified into their nine positions, totalizing 990 images.
Initially, the OpenCV library in Python language was used, initially, to standardize the images. Each eye was located, regardless of photo orientation (portrait or landscape), and a new square image cut out around them was generated. For this purpose, the image dimensions and colors were standardized, and the face inclination from the landmark facial extractor was corrected with 68 points obtained from the Dlib library, which provided its face center and generated an image cut for each eye.
Then, the images were grouped on the basis of eye (left or right), gaze (1 to 9), and version (1 to 9), as shown in
Table 2. The primary position of the image was not classified as a version, but it was only used as a reference gaze was classified as 5 and as a reference position.
Table 2. Reference Number of the Version and the Version Classification
Table 2. Reference Number of the Version and the Version Classification
The measurements of gaze excursion of the gaze and the subjective classification of the respective version were used to feed a Convolutional Neural Network that extracted the attributes of the images and their classification. Next, the angle of eye excursion of the eye from the primary position to the specified version was used in the process of ground truth labeling during Artificial Intelligence (AI) training.
ResNet50
18 was used as the architecture. The ResNet50 network was imported directly through the Tensorflow Keras application module. This version was pre-trained on ImageNet, which has more than one million images in 1000 categories, allowing for a vast quantity of learned representations of these images, which were used for transfer learning. The last layers were removed, the remaining layers were frozen, and a new fully convolutional head was added to perform finetuning. The layers used were AveragePooling2D (7 × 7), Flatten, Dense (256), Dropout (50%), and Dense (with the number of classes of the gaze versions).
To increase the database size, the data augmentation technique was used. Through it, each image was selected at random and received some transformations, including up to five-degree rotation, up to 5% increase in width and height, up to 0.05 perspective distortion, up to 1% zoom, and up to 10% increase and decrease in brightness (
Fig. 1). After this stage, the sample set consisted of more than 9600 images, which were separated into cross-validation sets for training, validation, and testing.
For the cross-validation process, the base was split into three parts: training, validation, and testing. In the first part, the model was trained to classify the eye versions correctly. In the sample set for validation, the model accuracy was measured according to the chosen metrics. After adjustments to the model, a final version was chosen and evaluated in a test set.
To train the neural network, 150 Epochs were used with a Learning Rate Finder and Batch Size function of 64 images. For quick inference via the browser, the StreamLit tool was used. The finished model was exported for use in mobiles through the Tensorflow Lite converter. For the transfer learning step on the pre-trained ResNet50, the last layers were removed, and the following layers were added: average pooling, flatten, dense, and dropout.
Figure 2 illustrates the flowchart of the sequence of platforms, from photo processing to conversion to Mobiles, used during development of the application. From the process of convolutional neural network creation, the mobile application could be created.
The classification performance measures recall, precision, F-score, and support were used in the sample to relate the eye position of the eye to its version. Recall, or sensitivity, is the proportion of true-positive cases that are correctly predicted positive. Precision denotes the proportion of predictive positive cases that are correctly true positives. The F-score is the weighted harmonic average of precision and recall. Support is the number of observations in which eye gaze and eye version are combined. For validation purposes, the model accuracy was measured according to the chosen metrics, and 15% of the images in each class were used.
Tables 3 and
4 detail the precision, recall, F1-score, and quantity of patient´s photo by eye, gaze, and version for the right and left eye, respectively.
Table 4 does not show gaze 9 because there was no patient with this combination.
Table 3. Results for the Right Eye
Table 3. Results for the Right Eye
Tables 3 and
4 contain many missing classes or classes with very few observations, which prevented us from achieving good global quality metrics. We were not able to perform complete split for some of the classes because there was only one observation for them. The missing classes were due to the absence of patients with such characteristics in the Reference Hospital and not for an algorithmic reason.
Table 5 summarizes the result of the model by eye and gaze.
Table 5. Global Accuracy, Weighted Precision, Weighted Recall, Weighted F1, and Validation Loss
Table 5. Global Accuracy, Weighted Precision, Weighted Recall, Weighted F1, and Validation Loss
On the basis of the results, the model overfitted in cases of rare observations, sp the sample had to be increased. Gazes 1, 6, and 9 did not provide satisfactory results for the left eye, whereas gazes 1, 3, 6, and 8 did not provide satisfactory results for the right eye.
The mobile app developed herein for classification of ocular versions showed global accuracy ranging from 0.42 to 0.92 and precision ranging from 0.28 to 0.84. This accuracy range showed that the application had good potential for classification of eye versions, especially in some gazes like 2, 3, 4, 7, and 8 for the right eye and 2, 4, 7, and 8 for the left eye. Controlling the participants was difficult because the patients came from a university reference hospital that prioritizes the delivery of care to patients with specific pathologies, such as a Graves disease, which causes restrictive strabismus.
To develop the app, face positioning determined result accuracy. Kushner
22 demonstrated that posture is important when examining eye motility. He developed a cervical range of motion device, an instrument designed to assess the range of motion of the cervical spine for accurate quantification of the magnitude of the patient's head abnormal posture, limitation of doubles, or range of single binocular vision in distance fixation. In our study, we used the facial landmark extractor from the Dlib library. This tool allowed us to locate the center of the face effectively and reduced the possibility of posture bias with face rotation.
Photo standardization was crucial because images with the same color saturation and brightness cannot always be obtained. This standardization also contributed to training the application where it did not consider variables other than eye position. If this step was not performed, all the photos in a certain position rated −2 could be brighter than the other ratings, and the application could consider this difference in brightness and not the eye position.
Table 2 lists the results regarding the accuracy of the application in classifying the gaze and eye version in a determined gaze. Such accuracy showed whether the application can standardize the eye versions. In addition, the practicality of using the application on smart phones confirmed its applicability in the ophthalmologist's routine.
Urist
16 evaluated the versions by lateral reflection of light with limbus transillumination from illumination of the space between the eyes. According to this author, in 85% of normal eyes, reflexes located 10 mm from the limbus in the sclera of the abducted eye or 35° (Hirschberg scale) in the cornea of the adducted eye provide relevant evaluation in surgical cases, but they only discriminate between normal and abnormal muscle action.
Lim et al.
17 evaluated version classification when excursion was quantified in degrees, to find that the accuracy of average difference between the observers was 0.2°, with 95% confidence limits of 2.6° and 3.1°. However, the authors described a possible selection bias because there were no strabismus patients in the study. Despite the high accuracy, the need for editing in Photoshop to identify the limbus was another disadvantage.
Other applications of artificial intelligence in strabismus have been reported. For example, Lu et al.
19 described an automatic strabismus detection system for use in telemedicine. In their article, the authors depicted a set of tele-strabismus data established by ophthalmologists. Then, they proposed an end-to-end structure called Random Forest – Convolutional Neural Network (RF-CNN) to obtain automated detection of strabismus in the established set of tele-strabismus data. RF-CNN first segments the eye region in each individual image and further classifies the segmented eye regions with deep neural networks. The experimental results in the established strabismus dataset demonstrated that the proposed RF-CNN performs well in the automated detection of strabismus. In our application, the first step of ocular region detection and eye positioning followed the same proposal, with the difference that, on the basis of this information, we classified the eye version from the angle of eye excursion
Here, we respected the Classical Bioethical Principles applied in the use of artificial intelligence; nonmaleficence, beneficence, and justice are worth highlighting because the application could safely improve the quality of care, thereby improving patient's result.
20,21 We also respected responsibility and respect for autonomy by ensuring the participant's responsibility and authorization through the informed consent. Besides justice and nonmaleficence, we compared this study to validated clinical studies that guarantee safety, effectiveness, and equity in the intervention.
The app presented in this study is an early prototype that is undergoing further development. Thus the app and preliminary evaluation studies presented here had some limitations. First, we were not able to use the application in individuals with facial deformities or corneal changes because it was difficult to recognize the 68 points with the facial landmark extractor. Another limiting factor was that, to avoid confusion bias due to possible conjunctival scars, patients that had already been operated could not be included. Further studies are suggested to broaden the spectrum of the use of the application, and studies that consider only the eye position and not variables such as facial, conjunctival, or corneal anomalies are recommended.
The app is a unique feature, and the differences between the app and the traditional semiological measure of ocular version were demonstrated. The app could complement ocular motility evaluation on the basis of objective classification of the ocular versions.
The application can be potentially used as an easy-to-apply tool to reduce time and increase diagnostic accuracy. However, exploratory research and validation are necessary.
Supported by Xtrabismus, an Innovation Group of Strabismus.
Presented as a video at the ARVO 2020 Annual Meeting (Online event), and it was selected for the Travel Grant Awards. Published at: Figueiredo LA, Debert I, Dias JVP, Polati M. An artificial intelligence app for strabismus. Invest Ophthalmol Vis Sci. 2020;61(7):2129.
Disclosure: L.A. de Figueiredo, None; J.V.P. Dias, None; M. Polati, None; P.C. Carricondo, None; I. Debert, None