September 2018
Volume 7, Issue 5
Open Access
Articles  |   October 2018
Validation of an Objective Measure of Dry Eye Severity
Author Affiliations & Notes
  • Sezen Karakus
    The Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland, USA
  • Esen K. Akpek
    The Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland, USA
  • Devika Agrawal
    The Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland, USA
  • Robert W. Massof
    The Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland, USA
  • Correspondence: Robert W. Massof, The Wilmer Eye Institute at Johns Hopkins, 600 N Wolfe St, Wilmer B43, Baltimore, MD 21287-2816, USA. e-mail address: bmassof@jhmi.edu 
Translational Vision Science & Technology October 2018, Vol.7, 26. doi:https://doi.org/10.1167/tvst.7.5.26
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Sezen Karakus, Esen K. Akpek, Devika Agrawal, Robert W. Massof; Validation of an Objective Measure of Dry Eye Severity. Trans. Vis. Sci. Tech. 2018;7(5):26. https://doi.org/10.1167/tvst.7.5.26.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: We evaluated the validity of a single dry eye severity measure estimated using Rasch analysis from a battery of clinical tests and patient symptoms.

Methods: This study included 203 dry eye patients and 51 controls. Administered tests included the Ocular Surface Disease Index (OSDI), tear osmolarity, Schirmer's test, noninvasive break-up time, and ocular surface staining. Each of the 12 OSDI questions and each clinical test was defined to be a separate indicator to estimate a single dry eye severity measure from Rasch analysis. Measures of severity were estimated for each subject (person measures) and measures of sensitivity to severity were estimated for each sign and symptom (indicator measures).

Results: The average severity measure for dry eye patients was significantly greater than the average severity measure for controls (−0.39 vs. −1.2, P < 0.001). The distribution of indicator measures was well matched to the distribution of person measures. No indicator carried >10% of the total information about dry eye severity carried by all indicators together. However, the most informative indicators were corneal and conjunctival staining.

Conclusions: Our study indicated that there is no single “best” dry eye test. Clinical tests and symptoms should be used in combination to estimate a single dry eye severity measure.

Translational Relevance: There is no single “gold standard” testing method for dry eye that correlates with the severity of disease. We propose that Rasch analysis can be used to calculate an objective dry eye severity score from a battery of clinical indicators.

Introduction
Dry eye is a common condition worldwide affecting up to one-third of individuals older than 50 years.15 Dry eye is characterized by loss of homeostasis of the tear film, which results in tear film instability, increased tear osmolarity, ocular surface inflammation and damage, and neurosensory abnormalities.5 These signs are accompanied by symptoms of ocular discomfort and fluctuating vision that impact quality of life.5 A variety of clinical tests as well as symptom questionnaires are available to diagnose dry eye and determine disease severity.59 However, there is little correlation among or between physician-observed signs and patient-reported symptoms, which leads to confusion in monitoring changes and managing the disease.10,11 
Even though dry eye is one of the most prevalent ocular conditions affecting an estimated 25 million individuals in the United States alone (Market Scope. 2011 Comprehensive Report on the Global Dry Eye Products Market. St. Louis, MO: Market Scope, November 2011), surprisingly few approved and effective treatments of dry eye exist.12 One reason for this can be that investigators tend to seek a single “gold standard” to measure dry eye disease severity or response to treatment, which has eluded the dry eye field.5 The lack of a single “gold standard” sign or symptom that correlates with dry eye state has been noted again in the recently updated Tear Film and Ocular Surface Society (TFOS) Dry Eye WorkShop (DEWS) II Diagnostic and Methodology report.5 A new diagnostic scheme requiring a combination of symptoms and clinical signs has been proposed instead of using a severity grading table. The original and recently updated scheme follow the recommendations of an expert panel of clinicians and scientists.5 The original severity grading scale was modified previously to make it a single explicit continuous variable for the purpose of exploring its functional relationship to measures or ordinal scaling of dry eye signs and symptoms.13 This assumption of a single continuous severity variable for dry eye is implicit in the work of consensus panels,14 estimation of composite scores or measures from dry eye symptom,15 and requirement of a “single best test” for evaluating the efficacy of dry eye treatments by regulatory agents.13,16 
The hypothesis is that the magnitude of a single dry eye severity variable can be estimated from information carried by clinical signs and patient-reported symptoms using the mapping noise measurement model suggested previously.17 The mapping noise model reduces to a form equivalent to the Masters partial credit model18 when the constraints of axiomatic measurement theory (i.e., independent identically distributed random variables) and the assumption of a standardized logistic distribution (μ = 0; σ = π/√3) of the random variables are imposed. Rasch analysis, which uses the Masters partial credit model, assumes that dry eye severity is a latent variable that cannot be observed directly, but has a probabilistic relationship to manifest variables, such as signs and symptoms, that can be observed. In other words, “dry eye severity” is a theoretical construction from observations, and not an observation per se. Rasch models assume that patients differ from each other only in the severity of their dry eye. The magnitude of each observed sign and symptom is reported in its own units (e.g., mm of wetting, elapsed time in seconds, ordinal grade of staining), which the Rasch model assumes can be mapped to a common latent dry eye severity variable and that the unique mapping function for each sign and symptom is the same for every patient. The Rasch model then is used to estimate the most likely value of the patient's dry eye severity from the pattern of observed values of the patient's signs and symptoms.17 
We estimated a severity measure using this unique approach and tested its internal and external validity on a sample of physician-diagnosed dry eye patients (including patients with and without Sjögren's syndrome [SS]) and a sample of nonpatient controls (which could include cases of previously undiagnosed dry eye). 
Methods
Patients and Clinical Testing
This prospective clinical observational study was approved by the Johns Hopkins University institutional review board and adhered to the tenets of the Declaration of Helsinki. Dry eye patients with or without SS (previously established based on the American College of Rheumatology [ACR] Criteria19 endorsed in 2012) older than 18 years were recruited from the Wilmer Eye Institute Ocular Surface Diseases and Dry Eye Clinic, Johns Hopkins University (Baltimore, MD). Patients with a prior physician-made diagnosis of dry eye, who were taking over-the-counter or prescription dry eye treatments, and who had dry eye–related procedures (including but not limited to tear duct occlusion, lipiflow, or intense pulse light treatment) were included in the dry eye group, independent of their ocular surface or tear film parameters. Consequently, the dry eye group subjects had a broad spectrum of severity of findings changing from mild to severe.5 Controls were volunteers older than 18 years who did not carry a physician-made diagnosis of dry eye, blepharitis, allergic conjunctivitis, or other ocular surface disease, were not seeking or had not sought eye care for ocular surface disease symptoms before enrollment in the study. Additionally, subjects who had any ocular surgery within 3 months of the study visit were not included. Patients were asked to discontinue use of any eye drops, including artificial tears, 12 hours before examination and patients and controls were asked to stop wearing contact lenses at least 1 week before examination. 
The number of subjects in our study was chosen to be comparable to that of an earlier study evaluating the validity of the added noise version of the Rasch measurement model.17 Subgroups of subjects were chosen specifically to represent the two extremes of the dry eye severity measure distribution (controls and SS-related dry eye patients) and the broad middle of the distribution (non-SS dry eye patients). 
After obtaining informed consent in accordance with the Health Insurance Portability and Accountability Act (HIPAA), a detailed review of systems was performed followed by an Ocular Surface Disease Index (OSDI) symptom questionnaire (Allergan, Inc., Irvine, CA) completed by each participant.20 The following tests then were performed with 10-minute intervals in between in the order listed: tear osmolarity (TearLab Corporation, Inc., San Diego, CA), Schirmer's test without topical anesthesia (Tear Flo; Sigma Pharmaceuticals, Monticello, IA), automated noninvasive break-up time (NIBUT; Tear Stability Analysis System [TSAS] used on the RT-7000 Auto Refractor-Keratometer; [Tomey Corporation, Nagoya, Japan), corneal staining with fluorescein (Ful-Glo; Akorn, Inc., Lake Forest, IL), and conjunctival staining with lissamine green (Green-Glo; HUB Pharmaceuticals, LLC., Rancho Cucamonga, CA). The order of tests was adopted from the Sjögren's International Collaborative Clinical Alliance (SICCA) study to obtain most accurate results.21 Schirmer's test was recorded at 1 minute and then at 5 minutes for each eye separately. Values of NIBUT greater than 3.0 seconds were considered normal as previously recommended.22 The SICCA grading system21 was used to rate corneal and conjunctival staining. The maximum possible corneal staining score for each cornea was 6. Nasal and temporal conjunctiva were graded separately with a maximum score of 3 for each area. The total possible maximum ocular staining score (OSS) was 12 for each eye. 
Estimating a Dry Eye Severity Measure
A single dry eye severity measure was estimated for each participant from Rasch analysis using a grouped item version of the Masters partial credit model23 of clinical test results and patient responses to each of the 12 items in the OSDI questionnaire. Each of the six clinical tests, including tear osmolarity, 1-minute Schirmer's test, 5-minute Schirmer's test, NIBUT, corneal staining, and conjunctival staining, and each of the 12 OSDI questions were defined to be a separate indicator variable, or “item,” in the partial credit model with five groupings of the indicators that used the same scoring:24 (1) tear osmolarity, (2) 1- and 5-minute Schirmer's tests, (3) NIBUT, (4) corneal and conjunctival staining, and (5) 12 OSDI items. The order of the tests must be the same within the study cohort to obtain a valid severity measure; however, a different order also can be used, if desired, as long as one test does not affect the following test adversely. The mean between the two eyes was calculated for each participant to determine a single variable per person for each clinical test, since the OSDI items do not differentiate between eyes. For the purpose of analysis, continuous clinical test variables (tear osmolarity, 1- and 5-minute Schirmer's tests, and NIBUT) were binned into quintiles and assigned rank scores. Rank scores also were assigned to the ordinal OSDI response categories for each question. Raw ordinal clinician rating scores of corneal and conjunctival staining were accepted at face value for Rasch analysis. The polarities of the indicator scores were adjusted so that in each case a greater score corresponded to greater dry eye severity. Measures of dry eye severity were estimated for each participant (person measures) and measures of sensitivity to dry eye severity were estimated for each clinical test and OSDI question (indicator measures). The probability of observing each ordinal category as a function of dry eye severity also was estimated for each indicator. 
Evaluating Internal Validity of the Measurement Model
The mapping error measurement model assumes there is a single systematic source of true variance (i.e., dry eye severity) and that any other sources of variance in the observed indicators, whether within or between persons, are random. If the estimated dry eye severity measurement model is internally valid (i.e., observations fit the model assumptions), then the ratio of the variance in the mean squared residuals (squared difference between observed and expected indicator rank scores) to the expected variance of the noise should be distributed as χ2/df across persons and across indicators.25 This ratio is equivalent to squared residuals summed across persons, to evaluate indicator fit, or across indicators, to evaluate person fit, multiplied by Fisher information, summed across persons or indicators, respectively (i.e., information-weighted mean square fit statistic or “infit”).26 The cube-root of χ2 is approximately normally distributed.27 Therefore, to simplify evaluation, we transformed the infit mean squares to z-scores (i.e., standard normal distribution), which have an expected value of 0 and a standard deviation of 1 (with 95% of the values falling within ±2 standard deviations [SD] of the expected value). 
If the estimated measure is a unidimensional variable with a single source of true variance, then the unexplained variance should be random. To test this hypothesis, we evaluated differential person functioning (DPF) and we performed a principal components analysis on the indicator score residuals (difference between the observed indicator value and the value expected by the model) and evaluated the scree plot to determine if there was evidence of nonrandom structure in the variance. 
Evaluating External Validity of the Estimated Measures
If the estimated measures are, indeed, measures of dry eye severity, then the measures should discriminate cases from controls with cases having worse dry eye. This hypothesis was tested by comparing the distributions of case person measures to control person measures and performing a t-test on these two distributions. Also, from clinical experience, we expect that patients with SS will have worse dry eye than do non-SS patients.28 This hypothesis also was tested by comparing distributions of estimated person measures between SS and non-SS cases and performing a t-test. In addition, a linear regression analysis after controlling for age and sex was performed to determine the associations between the dry eye status (SS or non-SS) and the magnitude of the severity measure. 
Results
Among the 254 participants, 203 were dry eye patients (55 had a previously established diagnosis of SS based on 2012 ACR Criteria19) and 51 with no previous diagnosis of dry eye or any ocular surface disease were included as controls. Participant characteristics are summarized in Tables 1 and 2. The dry eye group included a significantly higher proportion of females (P = 0.02). Except for tear osmolarity, dry eye measures were significantly worse in patients with dry eye compared to controls. Similarly, except for the OSDI score and tear osmolarity, dry eye measures were significantly worse for SS than for non-SS dry eye patients. As itemized in Table 3, Spearman correlations between OSDI questions (symptoms) and clinical signs were extremely low (mean = 0.12; range = 0.25–0.02). Correlations among OSDI questions were moderate to high (mean = 0.54; range = 0.36–0.79). The two Schirmer's test measures (at 1 and 5 minutes) were highly correlated (0.90), as were corneal and conjunctival staining scores (0.69). 
Table 1
 
Characteristics of Subjects According to Dry Eye Status
Table 1
 
Characteristics of Subjects According to Dry Eye Status
Table 2
 
Characteristics of Subjects With SS-Related Dry Eye Versus Non-SS Dry Eye
Table 2
 
Characteristics of Subjects With SS-Related Dry Eye Versus Non-SS Dry Eye
Table 3
 
Matrix of Spearman's Inter-Item Correlations Between Dry Eye Indicators
Table 3
 
Matrix of Spearman's Inter-Item Correlations Between Dry Eye Indicators
Table 3
 
Extended
Table 3
 
Extended
The distribution of indicator sensitivity measures was well matched to the combined distribution of dry eye severity measures for dry eye patients and controls (Fig. 1). Schirmer's tests at 1 and 5 minutes were the most sensitive of these 18 indicators, followed closely by tear osmolarity. OSDI item 9 (watching TV) was the least sensitive indicator, preceded by corneal staining. Overall measurement reliability was 0.82 for persons (i.e., on average, 18% of variance between persons in the observed dry eye severity distribution can be attributed to estimation error) and 0.97 for indicators (i.e., on average, 3% of variance between indicators in the observed indicator sensitivity distribution can be attributed to estimation error). The maximum dry eye severity information carried by each indicator ranged from 1.1 for NIBUT to 3.4 for corneal and conjunctival staining (Table 4, Fig. 2). No single indicator carried >10% of the total dry eye severity information carried by all 18 indicators together (ALL ITEMS). 
Figure 1
 
The distribution of estimated indicator sensitivity measures (gray bars) and the distribution of estimated dry eye severity measures for persons (black bars). Orange bars illustrate the Fisher information (in logit–2 units) carried by all 18 indicators combined as a function of the measure.
Figure 1
 
The distribution of estimated indicator sensitivity measures (gray bars) and the distribution of estimated dry eye severity measures for persons (black bars). Orange bars illustrate the Fisher information (in logit–2 units) carried by all 18 indicators combined as a function of the measure.
Table 4
 
Item Measures, Maximum Item Information, and Infit Mean Square Fit Statistics for the 18 Indicators
Table 4
 
Item Measures, Maximum Item Information, and Infit Mean Square Fit Statistics for the 18 Indicators
Figure 2
 
Maximum dry eye severity information carried by each indicator and all indicators.
Figure 2
 
Maximum dry eye severity information carried by each indicator and all indicators.
Of the participants, 5% had infit mean squares that exceeded expectations by >2.5 SD and 2% had infit mean squares that fell short of expectations by >2.5 SD (Fig. 3). These outliers represented significant departures from the expected distribution (P = 0.011 for the 2-tail Kolmogorov-Smirnov [K-S] test). When the distribution was truncated to remove these outliers, then the distribution was not significantly different from the expected normal distribution (P = 0.16 for the 2-tail K-S test). The indicator infit mean square z-scores had a bimodal distribution with large positive values for all clinical test indicators (average infit mean square in Table 4 is 1.375) and large negative values for all but one of the OSDI items (average infit mean square in Table 4 is 0.818; Fig. 4). The indicator measure is negatively correlated with the infit mean square z-score (Pearson r = 0.60), which suggests nonuniform DPF.29 These results indicated that there must be two sources of noise, with the average variance 1.7 times greater for clinical signs than it is for symptoms. Also, there was a trend for clinical signs to be more sensitive than symptoms to dry eye severity. 
Figure 3
 
The probability mass distribution (black bars) of infit mean square z-scores for the persons compared to the probability mass distribution expected by the measurement model (red curve).
Figure 3
 
The probability mass distribution (black bars) of infit mean square z-scores for the persons compared to the probability mass distribution expected by the measurement model (red curve).
Figure 4
 
Scatter plot of infit mean square z-scores for each indicator (horizontal axis) versus the corresponding estimated indicator measure (vertical axis). The solid vertical line is the expected infit mean square and the dashed vertical lines define the boundaries for ±2 SD from the expected value.
Figure 4
 
Scatter plot of infit mean square z-scores for each indicator (horizontal axis) versus the corresponding estimated indicator measure (vertical axis). The solid vertical line is the expected infit mean square and the dashed vertical lines define the boundaries for ±2 SD from the expected value.
As illustrated by the scatter plots in Figure 5, the OSDI-based person measures and the clinical sign-based person measures are linear with the person measures estimated from all 18 indicators. Consistent with the hypothesis of two different sources of noise with 1.7 times more noise variance for clinical signs, unexplained random variance about the regression line is 2 times greater for clinical sign-based measures (R2 = 0.34) than it is for OSDI-based measures (R2 = 0.68). The difference in slopes of the regression lines (1.75 for OSDI and 0.87 for clinical signs) also was consistent with differences in noise variance associated with the two sets of indicators, as opposed to uniform DPF, which would suggest the two sets of indicators were sampling two different latent variables in the sample of patients (see the Supplementary Material for the theoretical interpretation of the slopes of the regression lines). 
Figure 5
 
The scatter plots of the OSDI-based person measures and the clinical sign-based person measures.
Figure 5
 
The scatter plots of the OSDI-based person measures and the clinical sign-based person measures.
Principal components analysis (PCA) of response residuals showed that the estimated measures explained only 36% of the observed variance (Fig. 6A). The first two components of the residuals together accounted for 40% of the remaining variance (26% of the total variance). The higher order components accounted for <5% of the total variance and can be considered part of the scree (unstructured background noise). Figure 6B illustrates that the variance of response residuals for clinical sign indicators (black points) was in the direction of the first component and the variance of response residuals for OSDI items (gray points) was in the direction of the second component. Consistent with the interpretation of the bimodal distribution of infit mean square residuals for indicators (Fig. 4), these results confirm that there are two independent sources of noise variance contributing to the distribution of response residuals. 
Figure 6
 
(A) PCA of response residuals. (B) The variance of response residuals for clinical sign indicators (black points) and OSDI items (gray points).
Figure 6
 
(A) PCA of response residuals. (B) The variance of response residuals for clinical sign indicators (black points) and OSDI items (gray points).
As expected, the average severity measure for dry eye patients (−0.39 logit, SD = 0.59) was significantly greater than that for controls (−1.2 logit, SD = 0.83; 2-tailed t-test; P < 0.001; Fig. 7A). Again as expected, the average severity measure of patients with SS-related dry eye (−0.06 logit, SD = 0.50) was significantly greater than that of patients with non-SS dry eye (−0.30 logit, SD = 0.51; 2-tailed t-test; P = 0.003; Fig. 7B). After adjusting for age and sex in linear regression models, having dry eye was significantly associated with greater person measure compared to not having dry eye (0.70 logit, 95% confidence interval [CI] = 0.51–0.89, P < 0.001) and having SS-related dry eye also was significantly associated with greater person measure compared to having non-SS related dry eye (0.23 logit, 95% CI = 0.70–0.39, P = 0.004). 
Figure 7
 
(A) The cumulative person measure distribution for dry eye patients (solid line) versus controls (dashed line). (B) The cumulative person measure distribution for patients with SS (solid line) to that of non-SS patients (dashed line).
Figure 7
 
(A) The cumulative person measure distribution for dry eye patients (solid line) versus controls (dashed line). (B) The cumulative person measure distribution for patients with SS (solid line) to that of non-SS patients (dashed line).
Discussion
We demonstrated that dry eye signs and symptoms can work together to determine a single dry eye severity variable that is measurable on an interval scale (Fig. 1). Previously, a formal theoretical framework was proposed for estimating and validating a latent dry eye severity measure from clinical signs and patient symptoms.17 The theory assumes that each clinical observation and symptom carries information about dry eye severity, but it is degraded and/or masked by “noise,” which gives rise to random and systematic perturbations in the observations. A similar study aiming to create a single objective dry eye severity index evaluated the use of similar dry eye tests using the original DEWS severity scale; however, usefulness of the severity index was not studied in that report.13 Distinctively, we treated each OSDI question as a separate item in our analysis instead of using the total OSDI score and included slightly different set of clinical tests. Additionally, we used Rasch analysis to estimate the measure, which is a unique approach that assumes the magnitude of each observed symptom and sign can be mapped to a latent dry eye severity variable. Other research studies also used the Rasch analysis to investigate the functioning of the dry eye questionnaires.15,30 However, the main purpose of these studies was not to determine a variable to use as a severity measure for dry eye. To our knowledge, this is the first study to combine clinical observations with the symptom questions to estimate a single severity measure using a Rasch model. 
Even though the lack of correlation between dry eye signs and symptoms is well known,10,11 the most commonly used grading systems, such as the original DEWS severity table, require presence of severe signs and symptoms together to diagnose severe dry eye. Recently, the ODISSEY European Consensus Group also proposed an algorithm for evaluating the severity of dry eye.31 However, both previous algorithms are based on consensus methods without any prospective studies. In addition to the overall discordance between signs and symptoms, conflicting signs also are an issue in dry eye severity evaluation.5 For example, a low Schirmer's test can be seen without any significant ocular surface staining or low tear film break-up time in the same patient. The limited use of severity tables due to lack of strong association between the features of dry eye has been noted in the TFOS DEWS II Diagnostic and Methodology report.5 For this reason, the TFOS DEWS II scientific committee offered a new diagnostic scheme that suggests positive symptomatology should be accompanied by significant worsening in one of the clinical signs (NIBUT, osmolarity, or ocular surface staining) for the diagnosis of dry eye.5 However, we know that patients with severe debilitating symptoms with no significant clinical findings also exist. Although neuropathic pain rather than dry eye is suggested to be considered in this situation,5 absence of significant clinical signs may be momentary and should not exclude the diagnosis of dry eye. We previously demonstrated that evaluating tear film and ocular surface parameters at rest may miss clinical findings that can be seen in the same patient after a 30-minute reading, and baseline symptoms correlate better with signs measured after reading activity.32 This further proves that the concurrent presence of symptoms and signs should not be a requirement for dry eye diagnosis and traditional scoring algorithms remain insufficient.13 The severity measure that we estimated from Rasch analysis uses each information that each item provides. Additionally, our method does not ignore the less severe sign or symptom if another one indicates more severity, but rather combines all information available. We showed that no single test carries information for >10% of all items, and that corneal and conjunctival staining were the most informative, while NIBUT was the least informative (Fig. 2). However, the information value of any indicator alone was small – the 18 indicators working together provided 10 times the information of the single most informative indicator. 
We noted two sources of variability for the dry eye severity measure using the 18 indicators: there was more variance than expected for clinical signs and less variance than expected for patient symptoms, which potentially challenges the internal validity of the item measures because the error variance was not homogeneous across items (Fig. 4). The two sources of error variance do not appear to represent two different dry eye variables as no evidence of DPF is illustrated by the regressions in Figure 5 and their interpretation described in the Supplementary Material. The error variance is high (the measures account for only 36% of the observed variance – Fig. 6). 
There appear to be two independent sources of random error variance (Fig. 6). Potential candidates are that OSDI scores result from patient judgments and clinical signs involve a variety of physical measures, which can be affected by a number of physiologic and methodologic parameters, and clinician judgments. Also, OSDI scores are person level, clinical signs are for each eye separately. Thus, differences between eyes in dry eye severity can contribute to increased error variance among the signs. As expected if the measures are externally valid (i.e., we are measuring what we claim to measure), dry eye severity measures are significantly greater for dry eye cases than for controls (Fig. 7A) and significantly greater for dry eye cases diagnosed as SS than for cases that do not have SS (Fig. 7B). 
An objective single severity measure would be simple and very useful in clinical trials, particularly to determine eligibility, and evaluate the effectiveness of the therapy. Traditional algorithms may cause misclassification of dry eye patients under some circumstances. Some dry eye patients may be missed due to conflicting signs and symptoms. However, these patients exist and they ought to be included in the studies as well. In addition, when evaluating the effectiveness of any therapy, it is difficult to document the progress especially in case of conflicting signs. A therapy can be effective overall; however, this effect cannot be shown adequately if only one of the clinical signs is expected to be improved. Alternatively, the therapy might reduce all of the clinical signs as determined by the physician, but if patient symptoms do not improve, therapy can be deemed ineffective. The difficulties in clinical trials to evaluate the effectiveness of therapies for dry eye has been discussed extensively.6,12,3335 To obtain approval for a dry eye therapy in the United States, it is expected that signs and symptoms are improved by the suggested therapy in clinical trials. The multifactorial nature of dry eye, variability of signs and symptoms, and lack of widely accepted guidelines for the approval of a therapy for dry eye are the main drawbacks resulting in failure of many clinical trials leading to only a few approved treatment options.6,12,33,34 
This report demonstrated that currently available clinical tests for dry eye are insufficient due to high variability. Although the tests were done at 10-minute intervals in our study, the order of tests may have had an effect on variability, such as performing NIBUT after Schirmer's test. More informative tests with less variations are warranted. As new clinical tests are proposed, this method also can be used as a tool to assess their use. In fact, including more tests, such as tests evaluating lipid layer of tear film, meibomian glands, or neurosensory abnormalities, may yield more information on severity of dry eye. This approach currently is limited to be used in clinical trials or research studies as it requires complex calculations to estimate a severity measure for each person. The next level of the study will be including a large sample from multiple centers to create a large enough database to explore the dimensionality of the intrinsic variance. With doing so, we will be able to calibrate item measures for different clinical tests, and an application or a web-based calculator then could be used by clinicians to transform test results to the single dry eye severity measure, which in turn could be used to assess the severity and/or treatment response or progress of dry eye by clinicians in daily practice. 
In conclusion, our study indicated that standard dry eye clinical signs and symptoms can work together to define and measure a latent dry disease severity variable. There is no single “best” dry eye severity measure; rather, the most information about dry eye severity is carried by the battery of clinical indicators. We believe that the dry eye severity measure that we estimated from Rasch analysis would be very helpful to overcome most of the challenges of clinical trials. 
Acknowledgments
Supported in part by research grants provided by the King Khaled Eye Specialist Hospital (KKESH) and Jerome L. Greene Sjögren's Center, Johns Hopkins University. The osmolarity cards were donated by Tearlab (TearLab Corporation, Inc., San Diego, CA). 
Presented at the annual meeting of the Association for Research in Vision and Ophthalmology (ARVO), Denver, CO, May 3–7, 2015. 
Disclosure: S. Karakus, None; E.K. Akpek, None; D. Agrawal, None; R.W. Massof, None 
References
Schaumberg DA, Sullivan DA, Buring JE, Dana MR. Prevalence of dry eye syndrome among US women. Am J Ophthalmol. 2003; 136: 318–326.
Chia EM, Mitchell P, Rochtchina E, Lee AJ, Maroun R, Wang JJ. Prevalence and associations of dry eye syndrome in an older population: the Blue Mountains Eye Study. Clin Experiment Ophthalmol. 2003; 31: 229–232.
Schaumberg DA, Dana R, Buring JE, Sullivan DA. Prevalence of dry eye disease among US men: estimates from the Physicians' Health Studies. Arch Ophthalmol. 2009; 127: 763–768.
Paulsen AJ, Cruickshanks KJ, Fischer ME, et al. Dry eye in the beaver dam offspring study: prevalence, risk factors, and health-related quality of life. Am J Ophthalmol. 2014; 157: 799–806.
Wolffsohn JS, Arita R, Chalmers R, et al. TFOS DEWS II Diagnostic Methodology report. Ocul Surf. 2017; 15: 539–574.
Savini G, Prabhawasat P, Kojima T, Grueterich M, Espana E, Goto E. The challenge of dry eye diagnosis. Clin Ophthalmol. 2008; 2: 31–55.
Grubbs JRJr, Tolleson-Rinehart S, Huynh K, Davis RM. A review of quality of life measures in dry eye questionnaires. Cornea. 2014; 33: 215–218.
Schiffman RM, Christianson MD, Jacobsen G, et al. Reliability and validity of the Ocular Surface Disease Index. Arch Ophthalmol. 2000; 118: 615–621.
Ozcura F, Aydin S, Helvaci MR. Ocular Surface Disease Index for the diagnosis of dry eye syndrome. Ocul Immunol Inflamm. 2007; 15: 389–393.
Nichols KK, Nichols JJ, Mitchell GL. The lack of association between signs and symptoms in patients with dry eye disease. Cornea. 2004; 23: 762–770.
Bartlett JD, Keith MS, Sudharshan L, Snedecor SJ. Associations between signs and symptoms of dry eye disease: a systematic review. Clin Ophthalmol. 2015; 9: 1719–1730.
Sullivan DA, Hammitt KM, Schaumberg DA, et al. Report of the TFOS/ARVO Symposium on global treatments for dry eye disease: an unmet need. Ocul Surf. 2012; 10: 108–116.
Sullivan BD, Whitmer D, Nichols KK, et al. An objective approach to dry eye disease severity. Invest Ophthalmol Vis Sci. 2010; 51: 6125–6130.
Behrens A, Doyle JJ, Stern L, et al. Dysfunctional tear syndrome: a Delphi approach to treatment recommendations. Cornea. 2006; 25: 900–907.
Dougherty BE, Nichols JJ, Nichols KK. Rasch analysis of the Ocular Surface Disease Index (OSDI). Invest Ophthalmol Vis Sci. 2011; 52: 8630–8635.
Suzuki M, Massingale ML, Ye F, et al. Tear osmolarity as a biomarker for dry eye disease severity. Invest Ophthalmol Vis Sci. 2010; 51: 4557–4561.
Massof RW, McDonnell PJ. Latent dry eye disease state variable. Invest Ophthalmol Vis Sci. 2012; 53: 1905–1916.
Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982; 47: 149–174.
Shiboski SC, Shiboski CH, Criswell L, et al; Sjögren's International Collaborative Clinical Alliance (SICCA) Research Groups. American College of Rheumatology classification criteria for Sjögren's syndrome: a data-driven, expert consensus approach in the Sjögren's International Collaborative Clinical Alliance cohort. Arth Care Res (Hoboken). 2012; 64: 475–487.
Schiffman RM, Christianson MD, Jacobsen G, Hirsch JD, Reis BL. Reliability and validity of the Ocular Surface Disease Index. Arch Ophthalmol. 2000; 118: 615–621.
Whitcher JP, Shiboski CH, Shiboski SC, et al. A simplified quantitative method for assessing keratoconjunctivitis sicca from the Sjögren's Syndrome International Registry. Am J Ophthalmol. 2010; 149: 405–415.
Gumus K, Crockett CH, Rao K, et al. Noninvasive assessment of tear stability with the tear stability analysis system in tear dysfunction patients. Invest Ophthalmol Vis Sci. 2011; 52: 456–461.
Linacre M. Winsteps Rasch Tutorial 3, 2012. Available at: https://www.winsteps.com/a/winsteps-tutorial-3.pdf. Accessed on July 6, 2018.
Partial credit Rasch model: Available at: http://www.winsteps.com/winman/partialcreditmodel.htm. Accessed on September 27, 2017.
Smith RM, Suh KK. Rasch fit statistics as a test of the invariance of item parameter estimates. J Appl Meas. 2003; 4: 153–163.
Massof RW. Understanding Rasch and item response theory models: applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthalmic Epidemiol. 2011; 18: 1–19.
Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci U S A. 1931; 17: 684–688.
Akpek EK, Klimava A, Thorne JE, Martin D, Lekhanont K, Ostrovsky A. Evaluation of patients with dry eye for presence of underlying Sjögren syndrome. Cornea. 2009; 28: 493–497.
Johanson G, Alsmadi A. Differential person functioning. Educ PsychMeasure. 2002; 62: 435–443.
Gothwal VK, Pesudovs K, Wright TA, McMonnies CW. McMonnies questionnaire: enhancing screening for dry eye syndromes with Rasch analysis. Invest Ophthalmol Vis Sci. 2010; 51: 1401–1407.
Baudouin C, Aragona P, Van Setten G, et al; ODISSEY European Consensus Group members. Diagnosing the severity of dry eye: a clear and practical algorithm. Br J Ophthalmol. 2014; 98: 1168–1176.
Karakus S, Agrawal D, Hindman HB, et al. Effects of prolonged reading on dry eye. Ophthalmology. 2018; 125: 1500–1505.
Foulks GN. Challenges and pitfalls in clinical trials of treatments for dry eye. Ocul Surf. 2003; 1: 20–30.
Novack GD, Asbell P, Barabino S, et al. TFOS DEWS II Clinical Trial Design Report. Ocul Surf. 2017; 15: 629–649.
Alves M, Fonseca EC, Alves MF, et al. Dry eye disease treatment: a systematic review of published trials and a critical appraisal of therapeutic strategies. Ocul Surf. 2013; 11: 181–192.
Figure 1
 
The distribution of estimated indicator sensitivity measures (gray bars) and the distribution of estimated dry eye severity measures for persons (black bars). Orange bars illustrate the Fisher information (in logit–2 units) carried by all 18 indicators combined as a function of the measure.
Figure 1
 
The distribution of estimated indicator sensitivity measures (gray bars) and the distribution of estimated dry eye severity measures for persons (black bars). Orange bars illustrate the Fisher information (in logit–2 units) carried by all 18 indicators combined as a function of the measure.
Figure 2
 
Maximum dry eye severity information carried by each indicator and all indicators.
Figure 2
 
Maximum dry eye severity information carried by each indicator and all indicators.
Figure 3
 
The probability mass distribution (black bars) of infit mean square z-scores for the persons compared to the probability mass distribution expected by the measurement model (red curve).
Figure 3
 
The probability mass distribution (black bars) of infit mean square z-scores for the persons compared to the probability mass distribution expected by the measurement model (red curve).
Figure 4
 
Scatter plot of infit mean square z-scores for each indicator (horizontal axis) versus the corresponding estimated indicator measure (vertical axis). The solid vertical line is the expected infit mean square and the dashed vertical lines define the boundaries for ±2 SD from the expected value.
Figure 4
 
Scatter plot of infit mean square z-scores for each indicator (horizontal axis) versus the corresponding estimated indicator measure (vertical axis). The solid vertical line is the expected infit mean square and the dashed vertical lines define the boundaries for ±2 SD from the expected value.
Figure 5
 
The scatter plots of the OSDI-based person measures and the clinical sign-based person measures.
Figure 5
 
The scatter plots of the OSDI-based person measures and the clinical sign-based person measures.
Figure 6
 
(A) PCA of response residuals. (B) The variance of response residuals for clinical sign indicators (black points) and OSDI items (gray points).
Figure 6
 
(A) PCA of response residuals. (B) The variance of response residuals for clinical sign indicators (black points) and OSDI items (gray points).
Figure 7
 
(A) The cumulative person measure distribution for dry eye patients (solid line) versus controls (dashed line). (B) The cumulative person measure distribution for patients with SS (solid line) to that of non-SS patients (dashed line).
Figure 7
 
(A) The cumulative person measure distribution for dry eye patients (solid line) versus controls (dashed line). (B) The cumulative person measure distribution for patients with SS (solid line) to that of non-SS patients (dashed line).
Table 1
 
Characteristics of Subjects According to Dry Eye Status
Table 1
 
Characteristics of Subjects According to Dry Eye Status
Table 2
 
Characteristics of Subjects With SS-Related Dry Eye Versus Non-SS Dry Eye
Table 2
 
Characteristics of Subjects With SS-Related Dry Eye Versus Non-SS Dry Eye
Table 3
 
Matrix of Spearman's Inter-Item Correlations Between Dry Eye Indicators
Table 3
 
Matrix of Spearman's Inter-Item Correlations Between Dry Eye Indicators
Table 3
 
Extended
Table 3
 
Extended
Table 4
 
Item Measures, Maximum Item Information, and Infit Mean Square Fit Statistics for the 18 Indicators
Table 4
 
Item Measures, Maximum Item Information, and Infit Mean Square Fit Statistics for the 18 Indicators
Supplement 1
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×