Abstract
Purpose:
Create a unique predictive model based on a set of demographic, optical, and geometric variables with two objectives: classifying keratoconus (KC) in its first clinical manifestation stages and establishing the probability of having correctly classified each case.
Methods:
We selected 178 eyes of 178 subjects (115 males; 64.6%; 63 females, 35.4%). Of these, 74 were healthy control subjects, and 104 suffered from KC according to the RETICS grading system (61 early KC, 43 mild KC). Only one eye from each patient was selected, and 27 different parameters were studied (demographic, clinical, pachymetric, and geometric). The data obtained were used in an ordinal logistic regression model programmed as a web application capable of using new patient data for real-time predictions.
Results:
EMKLAS, an early and mild KC classifier, showed good training performance figures, with 73% global accuracy and a 95% confidence interval of 65% to 79%. This classifier is particularly accurate when validated by an independent sample for the control (79%) and mild KC (80%) groups. The accuracy of the early KC group was remarkably lower (69%). The variables included in the model were age, gender, corrected distance visual acuity, 8-mm corneal diameter, and posterior minimum thickness point deviation.
Conclusions:
Our web application allows fast, objective, and quantitative assessment of early and mild KC in detection and classification terms and assists ophthalmology professionals in diagnosing this disease.
Translational Relevance:
No single gold standard exists for detecting and classifying preclinical KC, but the use of our web application and EMKLAS score may aid the decision-making process of doctors.
For this observational comparative study, 178 eyes of 178 subjects between the ages of 15 and 76 years were selected; 115 of the subjects were male (64.6%) and 63 were female (35.4%). Of these, 74 were included in the healthy control group. Their ages ranged from 18 to 63 years (average age, 41 ± 23.7 years), and there were 42 males (56.7%) and 32 females (43.3%). The remaining 104 subjects had been diagnosed with KC; they were 16 to 76 years old, with 73 males (70.1%) and 31 females (29.9%). This second group was divided again into two subgroups depending on the degree of KC according to the RETICS grading system: I (early KC) or II (mild KC).
18 The early KC group was comprised of 61 subjects ranging in age from 15 to 59 years (average age, 36.0 ± 21.0 years), of whom 45 were male (73.8%) and 16 female (26.2%). The mild KC group was comprised of 43 subjects ranging in age from 17 to 76 years old (average age, 46.2 ± 29.2 years), with 28 males (65.1%) and 15 females (34.9%).
A second dataset was obtained 4 months after recruiting the first set of individuals used for the training process, taking care to include no patient from the training group in the validation group. This new dataset represented 41 individuals, of whom 19 were healthy, 14 were classified as RETICS grade I, and eight were classified as RETICS grade II. This dataset was used to make an independent validation of the ordinal logistic regression model.
All of the subjects were recruited at Vissum Corporation Alicante (an institution affiliated with Miguel Hernández University, Elche, Spain) and formed part of the official Iberia database of KC cases generated for the National Network for Clinical Research in Ophthalmology RETICS-OFTARED. All of the participants provided written informed consent, and the study, which followed the tenets of the Declaration of Helsinki, was approved by the clinic's Ethics Committee for Clinical Research.
To avoid undesired biases, any subjects who had undergone previous ocular surgery, had worn contact lenses in the 4-week period running up to the tomographical evaluation, or showed any other ocular comorbidity that could affect the study outcomes were eliminated.
The cases in the control group were randomly selected from candidates for the refractive procedure, and the data included in the study were acquired during the patients’ presurgical consultations, always by the same experienced technician.
The procedure followed for both KC group diagnosis and classification was based on state-of-the-art clinical and topographical evaluations (
Fig. 1), including ultrasonic pachymetry, fundus evaluation, manifest refraction (sphere and cylinder), slit-lamp biomicroscopy, uncorrected distance visual acuity (UDVA), corrected distance visual acuity (CDVA), and Goldmann tonometry.
19 For all cases, clinicians searched for presurgical evidence of KC, such as the presence of the asymmetric bowtie pattern (with or without skewed axes), Fleischer's ring, Rizzuti's sign, Munson's sign, anterior stromal scar, or stromal thinning.
Of all the eyes considered, 41.5% were in the control group (74 healthy eyes), 34.3% were in the early KC group (61 eyes), and 24.2% were in the mild KC group (43 eyes).
Table 1 summarizes all of the variables measured initially for all of the patients, who were then segregated into healthy individuals (control), patients showing early KC eyes, and those with a mild form of disease development. Some trends of the association between variables and grades were observed and were further tested by the Kruskal–Wallis test for the quantitative variables and the χ
2 test for the qualitative variable of gender. All of the quantitative variables except for age, axis, and spherical aberration (Z40) showed a significant relation, whereas no significant difference was found for gender. A non-parametrical test was used because most quantitative variables did not pass the normality test.
Table 1. Demographic, Clinical, and Morphogeometric Parameters Segregated into Healthy (Control) and Early KC and Mild KC Patients
Table 1. Demographic, Clinical, and Morphogeometric Parameters Segregated into Healthy (Control) and Early KC and Mild KC Patients
Figure 2 reveals that all of the variables, except for age, sphere, axis, spherical aberration (Z40), and anterior/posterior minimal thickness point deviations, were strongly correlated and, therefore, provided very little information. This finding suggested that a simple model with a limited amount of variables should be used when applying the ordinal logistic regression technique with a variable selection algorithm. A minimum set of predictors providing the most information was selected by a backward stepwise procedure using the AIC. The final model included the variables shown in
Table 2, such as age and gender. When assessing the goodness of fit of this final model, the likelihood ratio test gave a
P < 0.001, and
P < 0.001 was also obtained by the Hosmer–Lemeshow test. The McFadden pseudo-
R2 gave a value of 0.507, indicating good predictive power.
Table 2. Ordinal Logistic Regression Model Summary
Table 2. Ordinal Logistic Regression Model Summary
Figure 3 shows the effects plot for the included variables. Age and gender contributed very little, but CDVA, 8-mm corneal diameter (Q
8mm), and posterior minimum thickness point deviation made an important and apparently homogeneous contribution among the groups.
The model passed all of the tests run to check for the proportional odds assumption required for the ordinal logistic regression to be valid. The Venables and Ripley test value was 0.180, and the omnibus Brant test value was 0.250. The individual Brant test results for age, gender, CDVA, Q8mm, and posterior minimum thickness point deviation were 0.307, 0.164, 0.196, 0.393, and 0.575, respectively. Thus, no significant deviation from assumptions was present.
Figure 4 shows the distribution of scores for each group. The prediction score for the control group follows a markedly different distribution for the true control versus true early KC and mild KC individuals. A similar behavior was observed for the mild KC prediction score for true mild KC versus true control and early KC patients. Nonetheless, an early KC prediction score for the true early KC versus true control and mild KC patients does not show such marked differences and indicates that the early KC patients lie somewhere in a zone between the other two groups.
Table 3 shows the corresponding model confusion matrix, where similar results are observed.
Table 3. Training Confusion Matrix Corresponding to the Ordinal Logistic Regression Model
Table 3. Training Confusion Matrix Corresponding to the Ordinal Logistic Regression Model
Table 4 reflects that the balanced accuracy for the control, early KC, and mild KC patients (0.83, 0.70, and 0.83, respectively), with an overall accuracy of 0.73 (95% confidence interval [CI], 0.65–0.79). McNemar's test yielded a
P value of 0.430, indicating homogeneity of the results. Only one true mild KC patient was predicted to be a control patient, and, once again, only one true control patient was predicted to be a mild KC patient. A higher degree of misclassifying was present among the adjacent groups (control vs. early KC, or early KC vs. mild KC), with 47 of the patients (26.4%) being incorrectly classified by combining false-positives and false-negatives.
Table 4. Ordinal Logistic Regression Model Training
Table 4. Ordinal Logistic Regression Model Training
Table 5 provides the inner validation procedure results, where 100 bootstrap resamples with substitution were obtained that contained the same total number of cases as the original dataset (n = 178). For each one, an equivalent ordinal logistic regression model was fitted using the same parameters indicated in
Table 2. This fitted model was used to classify the remaining cases, which were those not used in the bootstrap sample. The quality measurements of the model for sensitivity and specificity terms were averaged from these results with their corresponding confidence intervals. We can observe that approximately similar results were obtained, with a slight trade-off between sensitivity and specificity but with significantly higher values for the control and mild KC prediction scores than for early KC.
Table 5. Inner Validation Scores Obtained Using 100 Bootstrap Samples
Table 5. Inner Validation Scores Obtained Using 100 Bootstrap Samples
Table 6 presents the values of sensitivity, specificity, and balanced accuracy for the independent validation database. The obtained figures are slightly lower, but the results generally fall in line with those obtained during the internal bootstrap validation procedure (see
Table 5).
Table 6. Ordinal Logistic Regression Model Independent External Validation Scores
Table 6. Ordinal Logistic Regression Model Independent External Validation Scores
The power analysis results (
Fig. 5) indicate that statistical power exceeding 0.80 was achieved for the variables CDVA, Q
8mm, and posterior minimum thickness point deviation for sample sizes greater than 150 patients, whereas statistical power was around 0.50 for age and gender.
A web application containing the pre-trained model was created to allow users to instantly estimate the probability of an individual belonging to each modeled group using a minimal set of parameters. This application (
Fig. 6) was developed with Shiny v1.3.2 (RStudio, Inc.),
24 and it was deployed within the institutional intranet using the ShinyAuthr v0.0.99 authentication module (Paul Campbell) to prevent access by unauthorized users.
25
The landing page for the application (
Fig. 6) originally was a log-in form that added a secured authentication layer. Over time, as registration capabilities were disabled, new users were added by the system administrator. After log-in, the application exhibits a form composed of five text boxes that correspond to the model predictors, each filled in by default with typical values for a healthy individual. After inserting any new desired values and pressing the “GET SCORE” button, the trained model makes its prediction (
Figs. 7–
9) by providing an early or mild KC classification score (EMKLAS) as a percentage and by depicting a typical cornea, including some of the main parameters considered in the prediction.
Figures 7 to
9 are screen captures of one healthy individual (43-year-old female, oculus dexter [OD], CDVA = 1, Q
8mm = –0.2, posterior minimum thickness point deviation = 0.1), one patient with early KC (36-year-old male, OD, CDVA = 0.9, Q
8mm = –0.48, posterior minimum thickness point deviation = 0.9), and one patient with mild KC (52-year-old male, OD, CDVA = 0.6, Q
8mm = –0.75, posterior minimum thickness point deviation = 0.94), respectively. Each one also includes a 3D image of a typical cornea that represents how different predictors were calculated based on physical measurements.
Some cases, however, were incorrectly classified by our graphical user interface (GUI) due to the singularities characterizing KC.
Figure 10 includes some screenshots from different individuals representing all of the possible model classes. Patients A, B, and C were healthy individuals; patients D, E, and F had early KC; and patients G, H, and I had mild KC. Patient A, a 43-year-old female, was correctly classified as healthy and appears on the lower left end of the score line. Patient B, a 54-year-old male, was incorrectly classified as early KC. Patient C, a 47-year-old male, was incorrectly classified as mild KC. Patient D, a 30-year-old male, was incorrectly classified as healthy. Patient E, a 36-year-old male, was correctly classified as early KC. Patient F, a 28-year-old male, was incorrectly classified as mild KC. Patient G, a 79-year-old male, was incorrectly classified as healthy. Patient H, a 57-year-old male, was incorrectly classified as early KC. Finally, patient I, a 52-year-old male, was correctly classified as mild KC.
In essence, the GUI application is an approachable design accessible from any network-connected terminal, no matter what computer, tablet, or smartphone is used. It works with most of the widely used operating systems, and it does not require installing any drivers or software, as long as the web browser is up to date. It also automatically adjusts the screen layout to fit different screen sizes and orientations, thus making it more accessible and user friendly.
Given the multifactorial nature of KC, early KC detection is usually approached by making an optimal evaluation of risk factors.
2 However, the detection of KC in its primary preclinical forms remains a clinical challenge, as most research has presented models based on a wide variety of parameters that strongly depend on the characteristics of the analyzed sample.
4 Several robust predictive models for detecting incipient KC manifestations have been published in the scientific literature, although the lack of standardization makes their comparison difficult.
One of the main problems that ophthalmologists currently encounter is that experts have not reached an agreement about how early corneal ectasia should be characterized.
5–11 This is due to the ambiguity surrounding the disease definition in its preclinical phase,
4,8 the size of the samples used for these studies,
4 and the fact that most of the indices employed for disease detection are technology specific, thus rendering them non-interchangeable.
4,28 Hallak and Azar
29 suggested a possible solution to this problem through the use of artificial intelligence (AI): “AI will help with screening patients, improving diagnoses, and suggesting personalized treatments.”
In this study, therefore, we have defined a predictive model based on a set of optimal demographic, optical, and geometric factors measured by only one technology. This approach allows us not only to assess the current degree of disease development based on the level of a patient’s visual limitation but also to define the probability of correctly classifying each case. As far as the authors know, no previous studies have successfully combined demographic, optical, pachymetric, and morphogeometric variables in a real-time environment to detect and classify healthy, early KC, and mild KC eyes.
Expressing the probability of correctly classifying a patient as a score offers several benefits. First, reducing information from varied parameters of a diverse nature into a single and simple to understand parameter minimizes the risk of overlooking important information. In fact, this risk can be quite high, as typical analytical reports frequently include long lists of various parameters over several pages that must be read fairly quickly, and they rarely include associated normality intervals.
This approach also allows assessment of the joint actions of diverse parameters. Detecting the existence of a disease when a key parameter shows a significantly high or low value can be simple, but detection becomes more difficult when minor variations of several key parameters are present. In this case, the use of a score may help ophthalmological professionals make their assessments because it offers an objective and quantitative scale that addresses all possible parameter relations.
Table 7 shows that the results of several studies in which models were obtained by Scheimpflug technologies fall in line with ours.
6,8,30 Hwang et al.
8 proposed a detection model that combined five parameters (index height decentration, index vertical asymmetry, pachymetry apex, inferior-superior value, and Ambrosio's relational thickness maximum variability), with area under the curve (AUC) = 0.86, sensitivity of 83%, and specificity of 83%. Similar results have been obtained by other researchers
6,30 at the model development stage, depending on the limited metrics of Scheimpflug technology.
Table 7. Comparison of the Current Study with Earlier Studies
Table 7. Comparison of the Current Study with Earlier Studies
Other researchers have relied on multivariate systems to combine the use of two different technologies. Saad and Gatinel
10 created a model with 54 variables and six discriminant functions with 93% sensitivity and 92% specificity. It was validated in a posterior study,
31 with sensitivity and specificity values of 92% and 96%, respectively. These values are slightly better than those we obtained when discerning control (91% sensitivity, 80% specificity) and mild KC (97% sensitivity, 89% specificity) and are considerably better than our results for the early KC group (64% sensitivity, 80% specificity). However, this model used two different technologies, whereas ours employs only one.
Other research has proposed combining a set of different technologies.
32–34 In these cases, however, the authors established a KC suspect profile for suffering KC in a later stage, as they included subjects with manifest inferior steepening.
4
The latest KC severity classification tendencies indicate the use of machine learning-based approaches. Yousefi et al.
35 utilized an unsupervised machine learning analysis of over 420 parameters to classify 3156 eyes with only two eigen parameters. They reported 97.7% sensitivity and 94.1% specificity. However, these values were obtained in comparison with the CASIA ectasia screening index (ESI), so they cannot be generalized to the parameters generated by other technologies such as Sirius or Pentacam. Moreover, clinical diagnosis labels were not available in their study; hence, its accuracy could not be assessed. The same authors recently took this study even further and proposed a machine-learning model that predicts the likelihood of needing keratoplasty interventions.
36 Lavric and Valentin
37 implemented an algorithm that uses convolutional neural networks to detect the presence of KC with an accuracy of 99.33%, but this method uses topographic pictures of merely the anterior cornea surface and, as the device employed was Pentacam, the results were valid only for this technology.
Our classifier uses an ordinal logistic regression model that combines 27 parameters, obtaining an overall accuracy of 73% (95% CI, 65–79) in the training phase. This means that the model has correctly classified more than 70% of cases and has proven to be particularly accurate for the control and mild KC groups, with accuracies of 83% to 84%, respectively. The early KC group presented the lowest accuracy (70%), with 26 cases of 61 incorrect classifications. This can be explained by the difficulty of detecting KC in its early development stages, due to the consistency in corneal thickness that the corneal structure presents even when changing from a healthy scenario to a mild KC one, as the nine examples shown in
Figure 10 demonstrate.
This early and mild KC classifier has been trained by taking the diagnostics made by ophthalmological professionals as the gold standard, which unavoidably implies some undetermined amount of subjective information was used. During the fitting procedure, the model attempted to find a generalization linking predictors with prediction while maximizing performance. Nonetheless, some cases may involve certain samples not matching any kind of generalization given the subjective nature of the training data, making it difficult to establish a clear well-defined boundary between groups.
Uncertainty has always been considered a given in medical practice.
38 Eyes without a clear EMKLAS value for any of the groups (below 95%) could present some clinical peculiarities that make them different. Alternatively, they may correspond to evolutive cases in which some parameters change more quickly than others. A prospective study of these cases would be necessary to set an accurate decision threshold for considering a case to be KC suspect. In any case, when considering that wrongly classifying KC degrees is less important than classifying a diseased patient (early or mild) as healthy, the probabilities of belonging to both KC groups I and II can be summed in those uncertain cases to achieve the best diagnostic accuracy. In line with the doctor's criterion, any case of suspected KC should receive further clinical consultation.
Consequently, the presence of a certain lack of accuracy is something we can expect and does not necessarily mean that the model fitting ability fails. Our model has quantitatively confirmed the difficulty of distinguishing between groups, as the degree of misclassification between adjacent groups (control vs. early KC; early KC vs. mild KC) reached 26.4% of incorrectly classified patients when false-positives and false-negatives were combined, thus confirming the utility of our tool.
In this research, the AUC, specificity, and sensitivity of the model attained after the inner validation process suggest high performance, with AUC values of 0.87, 0.69, and 0.94 for the control, early KC, and mild KC groups, respectively. These specificity and sensitivity figures are slightly better that those obtained in the training stage in all cases, except for specificity for the control group, which was slightly lower (see
Table 5).
The independent validation of the ordinal logistic regression model showed an overall accuracy of 71% (95% CI, 55–84), suggesting that, even though the obtained quality figures were slightly lower (with accuracies of 79%, 69%, and 80% for the control, early KC, and mild KC groups, respectively), the results generally fall in line with those obtained in the internal bootstrap validation procedure, as
Table 6 shows. This indicates that the validated performance of the model is fairly good, even though the decisions based on the ordinal logistic regression model should be made cautiously, and it would be advisable to repeat the training process of the model with a bigger independent sample to validate the results.
Our study also presents some limitations. Apart from the previously mentioned subjectivity induced by using diagnostics made by ophthalmological professionals as the gold standard for model training, the sample size was limited by our inclusion criteria because we preferred to ensure that evaluated eyes were truly subclinical KC ones. It should also be taken into account that clinical metrics strongly depend on the technology used for their measure,
28 so our results can be considered valid only for those eyes tested using Sirius tomography.
In conclusion, in this work we have developed a GUI based on an ordinal logistic regression model that assesses the current degree of KC development and defines the probability of correctly classifying each case. Our model correctly classified more than 70% of cases and was particularly accurate for the control (79%) and mild KC (80%) groups, whereas the accuracy for the early KC group was considerably lower (69%). Thus, repeating the training process with a bigger sample using different data should be considered to improve these results. Although ordinal logistic regression is a widely used, state-of-the-art tool for biomedical data research, other techniques, such as deep learning, can be used to improve the quality of the results obtained.
This work was conducted as part of the Thematic Network for Co-Operative Research in Health (RETICS), reference number RD16/0008/0012, financed by the Carlos III Health Institute–General Subdirection of Networks and Cooperative Investigation Centers (R&D&I National Plan 2013-2016) and European Regional Development Funds (FEDER), as well as by the Results Valorisation Program (PROVALOR-UPCT), financed by the Technical University of Cartagena.
Disclosure: J.S. Velázquez-Blázquez, None; J.M. Bolarín, None; F. Cavas-Martínez, None; J.L. Alió, None