Automated and objective analysis of the glands was performed in a batch mode with no user input. Manual adjustment consisting of ROI selection was necessary in nine images (6.04%). In six of those cases, the images were not acquired properly, while in the remaining three images, a preprocessing problem was encountered. Acquisition problem can be due to an unfocused image, off center image with part of the tarsal conjunctiva out of the frame, or because the lower boundary of the upper eyelid is attached to the lower eyelid (see
Fig. 6). It is noteworthy that the algorithm never failed in the steps after the ROI selection.
The set of subjects was divided into 58 subjects with Meiboscore 0, 63 with Meiboscore 1, 22 with Meiboscore 2, and six with Meiboscore 3, according to grader 1 subjective criteria. For this distribution of subjects, the mean ± 1 SD values of the objectively estimated percentage of DOA were 12.55 ± 9.89, 20.71 ± 9.96, 29.17 ± 13.45, and 54.83 ± 12.30, respectively for Meiboscores 0, 1, 2, and 3. According to grader 2 subjective criteria, there were 55 subjects with Meiboscore 0, 66 with Meiboscore 1, 19 with Meiboscore 2, and nine with Meiboscore 3. For this distribution of subjects, the mean ± 1 SD values of the objectively estimated percentage of DOA were 12.04 ± 8.84, 19.56 ± 9.31, 30.63 ± 12.24, and 53.00 ± 10.19, respectively for Meiboscores 0, 1, 2, and 3.
Despite having only four choices, the intergrader variability was high for the subjective Meiboscore resulting in Spearman's correlation coefficient of
Display Formula\({r^2} = 0.50\). Both graders coincided in Meiboscore grade in 65% of the cases. To assess the agreement between both graders, the κ statistic and its statistical significance were computed. The κ statistic, proposed by Landis and Koch,
34 is used to assess the agreement when the measuring scale is ordinal—it indicates the proportion of agreement taking into account the expected agreement by chance. The agreement between graders was moderate
Display Formula\((\kappa = 0.463,{\rm{\ }}P \lt 0.001)\).
It can be seen that the objectively assessed percentage of the DOA does not correspond to the limits established for Meiboscore and that it is associated with high standard deviation. When applying the Meiboscore percentage limits (i.e., 0, 1–32, 33–65, and 66–100) to the objectively estimated DOA calculated with the proposed automated algorithm, the correlation between the subjective and objective classification is poor (Spearman's
Display Formula\({r^2} = 0.17\) and
Display Formula\({r^2} = 0.25\) for grader 1 and 2, respectively). The κ statistics showed a slight agreement between subjective and objective classification when Meiboscore limits criteria was used (
Display Formula\(\kappa = 0.151,P \lt 0.001\) and
Display Formula\(\kappa = 0.212,\,P \lt 0.001\) for graders 1 and 2, respectively). The limits established for Meiboscore did not consider the true distribution of the percentage of DOA and were set arbitrarily. Therefore, it becomes necessary to redefine the limits for the classification when performing automatic assessment of meibography. For this purpose, the percentage of DOA has been clustered in four classes so their intraclass variance is minimal using Otsu's classification algorithm.
35 The number of classes was chosen due to legacy issues according with the number of grades of the conventional grading scales. The new classification resulted in the following intervals: Grade 0 – 0 ≤ DOA < 16%, Grade 1 – 16% ≤ DOA ≤ 32%, Grade 2 – 32% < DOA ≤ 59%, and Grade 3 – 59% < DOA.
For Grades 0, 1, 2, and 3, the mean ± 1 SD values of the objectively estimated percentage of DOA were 8.82 ± 4.70, 22.79 ± 4.11, 40.96 ± 7.01, and 69.50 ± 2.12, respectively. One clear observation is that that those results are now associated with lower standard deviations than those encountered when using Meiboscore.
Figure 7 shows the box plots of the DOA variable for the four groups for both classifications (e.g., the subjective classification for both graders (
Figs. 7a,
7b) and the objective classification (
Fig. 7c). Although the difference between each pair of groups is statistically different using subjective and objective classifications (analysis of variance [ANOVA] with Tukey post hoc test,
P < 0.010 assuming
P < 0.05 was significant), with the automatic classification, the groups are more dissociated with less overlapping between them.
The
Table shows the mean values and standard deviations for the extracted parameters arranged according to this new objective classification. As expected, the number of glands, and length and width of the glands are inversely proportional to the DOA. Length and number of glands clearly decrease when the DOA increases and groups are statistically significantly different as assessed by the Kruskal-Wallis test (
P = 0.046 and
P = 0.018, for number of glands and length respectively). Even that the difference in width values was lower, the Kruskal-Wallis test also revealed statistically significant differences (
P = 0.015).