We employed Rasch analysis to estimate person and item measures (i.e., estimates of person ability and item difficulty) from NEI VFQ-25 participant responses on an invariant logit scale where the difference between
K and
K + 1 represents the same difference in visual function for every real number
K.
8,36 With Rasch analysis, missing item responses do not change the measurement scale (Rasch analysis assumes that the underlying latent trait is the same even with a missing response), unlike the recommended scoring strategy of the NEI VFQ-25, where the composite score depends on the number and choice of items rated in each subscale (e.g., rating only easy items changes the raw score). Instead, missing item responses in Rasch analysis change the standard error (i.e., precision of the estimate). A second advantage of Rasch analysis is that a single set of calibrated item measures can be provided for estimating person measures from different studies on the same scale, enabling direct comparisons.
37 This contrasts with item response theory (IRT), where each item has its own item discrimination parameter, effectively enabling each item to measure persons on its own scale. Item discrimination parameters add mathematical flexibility to IRT models and allow them to model the data better but at the expense of violating a fundamental property of measurement: that all items should measure the latent trait in the same the unit of measurement.
38 Thus, when the goal is to “measure” something rather than “model” the data, Rasch models are preferred.
39 Third, Rasch analysis estimates rating category thresholds (boundaries between neighboring rating categories on the real number line) that define the sizes of the intervals representing the rating categories. A very small interval tells us that the rating category is not easy to discriminate from its neighbors, unlike in Likert scales, where every rating category is assumed to be equally discriminable. Fourth, Rasch analysis provides us with standard errors, whereas the NEI VFQ-25 composite scoring strategy cannot. Finally, statistical power is greater when using Rasch analysis instead of composite scores (which under the best of circumstances should be considered nonparametric data).
40,41
Rasch analysis has previously been used to estimate item measures, person measures, and rating category thresholds for the NEI VFQ-25.
8,9,36 However, the Rasch models used (e.g., Andrich rating scale model, Masters’ partial credit model) often estimate disordered rating category thresholds, which is inconsistent with the concept of a rating scale, where ordered rating categories are separated by ordered thresholds.
42,43 To rectify this problem, advocates of the Andrich and Masters models have recommended merging neighboring rating categories as many times as necessary during post hoc analysis until all estimated rating category thresholds are ordered.
44 However, this practice creates a rating scale with fewer rating categories than the one administered in the original questionnaire, reducing the responsiveness of the instrument to potential effects of an intervention or exposure. Petrillo and colleagues
9 pooled six datasets (four of which are represented in this paper) and noted that 15 of the 25 items on the NEI VFQ-25 (plus six supplemental items) show disordered category thresholds when estimated with the partial credit model.
43 Rather than require post hoc manipulation and modification of the data to estimate ordered thresholds, we used the method of successive dichotomizations (MSD), which is a polytomous response model that always estimates ordered rating category thresholds and has been shown to estimate parameters in near perfect agreement with their true values using simulated rating scale data.
43,45
MSD extends the dichotomous Rasch model to multiple rating categories by applying the dichotomous Rasch model to every possible dichotomization of response categories. If the response categories are represented as non-negative integers from 0 to M, then MSD applies the dichotomous Rasch model to all M - 1 possible dichotomizations: {0} versus {1, 2, …, M}; {0, 1} versus {2, 3, …, M}; {0, 1, 2} versus {3, 4, …, M}; etc. The estimated item and person measures from each of the dichotomizations are then averaged to estimate final MSD item and person measures. The M - 1 rating category thresholds are subsequently estimated one threshold at a time. For each dichotomization of response categories, the estimated MSD item and person measures are anchored (i.e., their values are fixed and not estimated), and the remaining parameter in the dichotomous Rasch model (a single threshold) is estimated using maximum likelihood estimation. This mathematical approach toward estimating measures makes MSD a polytomous Rasch model that always estimates ordered rating category thresholds. MSD was implemented using the R package ‘msd.’ Because MSD is applicable only when all items have the same number of rating categories, we required all items in the NEI VFQ-VF, NEI VFQ-SE, and NEI VFQ-25C to have five rating categories.