Open Access
Articles  |   April 2022
Calibration of the Dutch EyeQ to Measure Vision Related Quality of Life in Patients With Exudative Retinal Diseases
Author Affiliations & Notes
  • T. Petra Rausch-Koster
    Amsterdam UMC, Vrije Universiteit Amsterdam, Ophthalmology, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
    Bergman Clinics, Department of Ophthalmology, The Netherlands
  • Michiel A. J. Luijten
    Emma Children's Hospital, Amsterdam UMC, University of Amsterdam, Child and Adolescent Psychiatry & Psychosocial Care, Amsterdam Reproduction and Development, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
    Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
  • F. D. Verbraak
    Amsterdam UMC, Vrije Universiteit Amsterdam, Ophthalmology, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
  • Ger H. M. B. van Rens
    Amsterdam UMC, Vrije Universiteit Amsterdam, Ophthalmology, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
  • Ruth M. A. van Nispen
    Amsterdam UMC, Vrije Universiteit Amsterdam, Ophthalmology, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
  • Correspondence: T. Petra Rausch-Koster, Amsterdam UMC, Location VUmc, Ophthalmology PK4X, PO Box 7700, 1000 SN Amsterdam, The Netherlands. e-mail: t.p.rauschkoster@amsterdamumc.nl 
Translational Vision Science & Technology April 2022, Vol.11, 5. doi:https://doi.org/10.1167/tvst.11.4.5
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      T. Petra Rausch-Koster, Michiel A. J. Luijten, F. D. Verbraak, Ger H. M. B. van Rens, Ruth M. A. van Nispen; Calibration of the Dutch EyeQ to Measure Vision Related Quality of Life in Patients With Exudative Retinal Diseases. Trans. Vis. Sci. Tech. 2022;11(4):5. https://doi.org/10.1167/tvst.11.4.5.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: This study aims to develop an item-bank to measure vision-related quality of life (Vr-QoL) and subsequently calibrate this set of items.

Methods: Three Vr-QoL instruments were searched for suitable items to be added in the EyeQ. Patients who received antivascular endothelial growth factor treatment for various retinal diseases involving macular edema were included in the study and completed the 47-item EyeQ. Item response theory (IRT) was used to calibrate the EyeQ items, which was performed multiple times in subsets as a novel approach, containing 80% of the data. Differential item functioning (DIF) was evaluated for various variables.

Results: Responses of 704 patients were used in analysis. One item violated the local independence IRT-assumption and showed a high percentage of missing values, after which this item was deleted from the item-bank. The data of the five subsets fitted the graded response model adequately, and no DIF was detected for items between subsets, after which mean item parameters were calculated. Item fit statistics were found to be good. DIF was detected for gender, age, and administration mode by the patient (independently vs. with help), this involved three items, which all showed negligible impact on total scores.

Conclusions: Because of separate calibrations of the EyeQ in multiple subsets, a high robustness of item parameters is expected.

Translational Relevance: The calibrated EyeQ can now be used for the assessment of Vr-QoL in patients suffering from exudative retinal diseases and is promising for use as a computer adaptive test.

Introduction
The prevalence of nonrefractive vision impairment, according to the WHO`s definition of having a best corrected visual acuity < 20/60, in European countries is approximately 1% to 2%.1 The major causes of nonrefractive low vision worldwide are cataract, age-related macular degeneration, glaucoma, and diabetic retinopathy.2 because of the pathophysiology of these eye diseases, individuals over age 50 years are more frequently affected. For patients suffering from macular edema caused by underlying retinal diseases, such as neovascular age-related macular degeneration (nAMD), retinal vein occlusion, or diabetic retinopathy (DR), intraocular injections with anti-vascular endothelial growth factor (anti-VEGF) are often beneficial: In most treatment-naïve patients anti-VEGF leads to a stabilization (about 50%) or improvement (≥10 letters gain (two lines on EDTRS chart)) (>30%) of their vision after three years’ follow-up. However, for about 15% of these patients anti-VEGF is less effective, resulting in reduced vision (≥10-letter loss).3 Eventually, the loss of vision will cause limitations in physical functioning, daily activities and might have impact on the quality of life.47 
Evaluating patients’ disabilities in daily activities and vision-related quality of life has become more important in ophthalmology. These outcomes are from the patient's own perspective and therefore of direct relevance to them. Furthermore, patient-reported outcome measures (PROMs) might help clinicians in their communication toward patients.8 Additionally, the assessment of quality of life is increasingly introduced because of the interest of the government and health insurance companies to evaluate the quality of care.9 Even though the added value and benefits of measuring and evaluating PROMs are clearly seen by patients, professionals and health care institutes, the implementation of measuring and evaluating PROMs in clinical practice and supporting the effort of the patient to periodically fill out the questionnaires is still a major challenge. 
In the past, various PROMs were developed. To solve measuring and scoring problems regarding first-generation PROMs (in which equidistance between response categories and equal value of items is assumed) second-generation PROMs (in which Item Response Theory [IRT] is used to calibrate items and respondents on the same scale to provide a better scoring mechanic that takes the psychometric properties of items into account) were developed frequently.10 Recently, as a proposed solution for the limitations of first- and second-generation PROMs in clinical practice (e.g., logistical, technical) and to reduce patient’s burden filling out a PROM to a minimum, item-banks have been developed, which are collections of items across a disability spectrum.11 These item-banks can be used to apply computerized adaptive testing (CAT) and is currently more frequently introduced in health care.1214 In contrast to long questionnaires including a broad range of items on the health continuum that all must be answered by the patient to accurately measure their ability, a CAT, or tailored test, selects the next question from the item-bank using an algorithm. The selection of the next item from the item-bank is based on the response option that the participant has chosen on the previous question: after each response the patient’s summary score (“theta”) is recalculated, and a next item is selected by the algorithm. The CAT will continue selecting items depending on which administration rules have been determined. Stopping criteria that can be considered are length, precision, classification, or information, and combinations of these criteria are also a possible solution to optimize the performance of the CAT.15 A significant advantage of a CAT is that the level of the patient's ability can be estimated very precisely, requiring a considerably smaller number of items. This will considerably reduce time and effort, as well as frustration and careless responses16 caused by administration fatigue. Two recent examples in ophthalmology are the Impact of Vision Impairment–Computer Adaptive Test (IVI-CAT) and the Diabetic Retinopathy and Macular Edema Quality-of-Life (DR/DME) QoL item-banks.1719 The first study focused on developing a CAT based on the 28-item Impact of Vision Impairment Profile (IVI), whereas the other, the DR/DME QoL item-bank, contains 287 items, categorized under 10 domains, each responsible for a separate item-bank. Based on the outcome of interest, a choice can be made regarding which domains are used. The average amount of items that was required to estimate a person level of disability was approximately seven items per item-bank. 
In previous research we translated the IVI forward from English into Dutch and backward twice and evaluated its content validity by performing cognitive debriefing interviews in Dutch patients (The Netherlands) who receive intraocular injections with anti-VEGF for exudative retinal diseases.20 This led to some adaptations and the necessity to expand the IVI with other relevant items for the development of the Dutch EyeQ item-bank. The purpose of the current study is to calibrate the EyeQ item-bank for measuring vision-related quality of life (Vr-QoL) as an important step in the development of a CAT. 
Methods
The EyeQ
The EyeQ is based on the 28-item Dutch version IVI.20 As a broad range of items on the construct continuum is preferred to develop a CAT that can provide precise measurement for the wide range of ability levels,15 we investigated the content of the Dutch versions of the low-vision quality-of-life questionnaire and the National Eye Institute Visual Functioning Questionnaire 25.2123 Most items appeared to be similar to the items of the IVI, but we found 19 items that had relevant unique content, based on results of our previous research.20 Before adding items, we reformulated the specific items to fit into identical response categories. In total we selected 47 items for the EyeQ. The EyeQ items are scored using a four-point Likert scale with the following response categories: never (1), sometimes (2), often (3), and always (4) because in previous research no comprehensibility problems or other issues arose regarding these response categories.20 The response category “not applicable” was supplemented with “I don't do this for reasons other than my eyesight” and was treated as a missing value. The order of the 47 items was randomized using a random number generator to avoid possible effects of careless behavior or fatigue toward the end of the test, to the detriment of the same items. This resulted in 10 different versions of the EyeQ. 
Study Design and Participants
The study protocol was approved by the Medical Ethics Committee of Amsterdam University Medical Centers and conducted according to the Declaration of Helsinki. The Medical Ethics Committee declared that the protocol did not fall under the scope of the Medical Research Involving Human Subjects Act (Dutch law). 
Adults aged over 18 years who are diagnosed with macular edema caused by nAMD, DR, and retinal vein occlusion and currently receiving anti-VEGF injections in the Bergman Clinics eye hospitals were invited by letter. We explained the aim, procedure, and duration of the study, and we asked whether they would agree to participate. To create a reliable representation of the clinical variety of patients receiving anti-VEGF treatment in ophthalmic clinical practice, no restrictions for participation were made based on visual acuity or the duration of treatment with intravitreal anti-VEGF. Patients had to have adequate knowledge and understanding of the Dutch language. All patients signed written informed consent and subsequently were included in the study. Participants were given the possibility to fill out the questionnaire via an online form, by a printed copy, sent to their address, or by telephone. In addition, participants were asked to complete various socio-demographic questions, questions regarding comorbidities and a generic health QoL questionnaire; EuroQol 5 Dimensions (EQ5D-3L). The EQ5D is a commonly used generic health status measurement, and it evaluates five dimensions of functional impairment including mobility, self-care, usual activities, pain/discomfort, and anxiety/depression with a three-level response option.24,25 Clinical characteristics, regarding ocular comorbidities, visual acuity, treated site anti-VEGF, and diagnosis for which anti-VEGF treatment was received were manually searched in digital patient records. 
Statistical Analysis
All statistical analyses were performed using SPSS (version 26.0)26 and R using the ltm package.27 Patient characteristics were analyzed using descriptive statistics. Before the calibration of the EyeQ using IRT (i.e. the graded response model [GRM]), we investigated the response percentages on the items. Because items with high missing rates are indicative for less reliable measurement properties, we removed items if the missing proportion was higher than 50%, whereas percentages between 30% and 50% were flagged for potential removal, these limits were arbitrarily chosen. Additionally, participants with missing responses above 25% were removed. After filtering the high proportioned missing values in items and participants, we checked the distribution of responses over response categories for possible floor- and ceiling effects and possible conjunction of categories in order to create a more equal distribution. Item-pairs with >0.75 inter-item collinearity were flagged, because this could be a sign of similarity and therefore be considered redundant. 
Important assumptions that are required for IRT modeling were checked: 
Unidimensionality of the Construct
Unidimensionality of the construct (i.e., all items representing a single latent trait) was examined by the output of a confirmatory bi-factor analysis and an explanatory factor analysis (EFA), the principal component analysis. A bi-factor analysis tests the item loadings on other factors in addition to the general factor. If omega hierarchical (ωH) is >0.80, it is accepted to consider total scores as essentially unidimensional. An explained common variance attributed by the general factor >0.70 is indicative to assume that the factor loadings obtained from a unidimensional model might approximate well the factor loadings obtained from a bi-factor model.28 An EFA tests the amount of explained variance by the first factor. EFA was performed for a one-factor and a two-factor model. Thereafter the ratio of explained variance by factor one and factor two was determined that should have a minimum of four to assume unidimensionality.29 In a subjective approach we examined the item loadings on the first factor by evaluating the eigenvalues in a scree plot. Additionally, we used the acceleration factor as a nongraphical alternative, which determines the coordinate where the slope of the curve changes most abruptly.30,31 
Local Independence
This states that every item on a measure, given a particular latent trait value (theta), is statistically independent of responses to all other items on that measure.3133 Values of item residuals above 0.25 are considered as items violating local independence. 
Monotonicity
This assumption implies that as a respondent moves to a higher level of the latent trait (i.e., increased disability), the probability of endorsement of a successive threshold never decreases. We used Mokken scale analysis for the assessment of manifest monotonicity by examining graphs.34,35 Additionally, Loevinger H coefficients of the items were calculated as a function of the Guttman errors between pairs of items to examine their scalability,36 where values below 0.30 were considered as unsatisfactory.35 
As a novel approach in ophthalmology, we used the full dataset and also created five random subsets, each consisting of 80% of the data. This way, we assessed to what extent the estimates varied across subsets due to a possible selection bias (e.g., for age or gender) and subsequently calculate mean estimates. A random number generator was used to create five reproducible datasets. The GRM estimates a discrimination (α) parameter and location (i.e. thresholds) (β) parameters for each response-category of the item. The discrimination parameter reflects how well the item can distinguish differences in patient`s level of ability, where a higher discrimination parameter refers to a higher separative power. The item thresholds parameters locate the item response categories on the disability continuum. Item parameters were estimated using a marginal maximum likelihood approach as it easily handles perfect response patterns and is applicable in polytomous IRT models.37 The assessment whether the data fits the GRM was performed by comparing the full GRM model fit to a constrained GRM model using the marginal maximum likelihood estimates with a Likelihood-Ratio test. The constrained model, which is similar to the Rasch model, does not allow the discrimination parameter to vary between items. This procedure was repeated for all five subsets and full data. Differential item functioning (DIF) was inspected using an iterative hybrid ordinal logistic regression analyses to assess differences in probabilities of selecting a certain item response between subsets and full data. The Likelihood-Ratio χ2 test at a level of 0.01 was used as detection criterion for both uniform DIF (DIF that is proportional across levels of the underlying latent trait) and nonuniform DIF (DIF that is nonproportional across levels of the underlying latent trait).33,38 In case of significant DIF, McFadden’s pseudo R2 was used to measure change in DIF magnitude, where a 2% change was considered as critical value.39 Finally, mean GRM estimates were calculated out of five subsets to create the pooled dataset. 
Subsequently, item goodness-of-fit was evaluated using the generalized S-X2 index, which is used for polytomous items to compare observed and predicted response proportions,4042 and item and test information were assessed. Item information refers to the information content of an item in relation to the total test information and therefore, item information is a representative for measurement precision or reliability.31 Items contributing <3/4 of ideal item information across the disability continuum (based on total test information) were (arbitrarily) considered for elimination. However, we acted reservedly in actual deletion of items that contributed little to the test information, as a balanced item-bank should contain items that cover the whole range of ability levels.15 Little informative items were identified by evaluating item information curves and category response curves. The range of theta over which the item is most informative is visible in an item information curve. 
DIF was inspected to assess whether participants with different characteristics, having the same level of disability, have equal probabilities of selecting a specific response category.33,38 Again, the Likelihood-Ratio χ2 test and McFadden’s pseudo R2 were used to detect DIF and to measure the change in DIF magnitude, respectively, using the same detection criteria as mentioned above. In addition, the impact of DIF on test scores was inspected by plotting test characteristic curves, which represent the relation between expected test scores on the y-axis and the thetas on the x-axis. Detection criteria for DIF were kept equal to DIF detection in subsets. DIF was evaluated for gender, nationality, visual acuity, age, diagnosis, civil status, EQ5D score, the number of nonocular comorbidities, administration mode (independently vs. with help), and completion method (paper vs. digital). 
Results
Patient Characteristics
Patients (N = 3783) were invited, and 746 were willing to participate (response rate 19.7%), met the inclusion criteria, and gave their written informed consent. Seven hundred thirteen participants filled out the EyeQ. Nine patients with an excessive number of missing responses (>25%) were excluded from the analyses. Sociodemographic and clinical characteristics of the remaining 704 participants are summarized in Table 1
Table 1.
 
Sociodemographic and Clinical Characteristics of Participants (n = 704)
Table 1.
 
Sociodemographic and Clinical Characteristics of Participants (n = 704)
Calibration of the EyeQ and Item Analyses
The confirmatory bi-factor analysis showed a ωH and explained common variance of 0.85 and 0.78, respectively. Both values are supportive to assume unidimensionality.28 The principal component analysis showed a variance of 49% that could be explained by the first factor, whereas the second factor contributed 4% of variance; thus the ratio explained by the first and second factor is 12.25, which is well above the required minimum of 4.29 The scree plot and acceleration factor were also supportive for unidimensionality (Supplement 1). One item pair, CAT33 “Driving a car during the night” and CAT34 “Driving a car under difficult circumstances (bad weather, rush hour, etc.),” violated the local independence assumption with residuals above 0.25 and showed inter-item correlation >0.75. Mokken analysis showed that all items complied with monotonicity, and all Loevinger H coefficients were above the required 0.3, which indicated sufficient scalability. The internal consistency reliability coefficient (Cronbach`s alpha) of the one-factor scale was 0.98. All response categories in all items were endorsed (Table 2); however, response categories “often” and “always” were chosen infrequently by the participants. To create a more equal distribution of the responses, these categories were collapsed. Finally, the already flagged CAT33 item was removed because of the high percentage of missing data (31.0%). 
Table 2.
 
Distribution of Responses Over the Response Categories of the EyeQ Item-Bank
Table 2.
 
Distribution of Responses Over the Response Categories of the EyeQ Item-Bank
Calibration of the EyeQ Item-Bank
A Likelihood-Ratio test showed that the unconstrained GRM was preferred above the constrained model for the 46 items, which was tested for all five subsets (1 to 5) and the full dataset (1: LRT = 258.9, P < 0.001; 2: LRT = 263.2, P < 0.001; 3: LRT = 286.1, P < 0.001; 4: LRT = 253.0, P < 0.001; 5: 272.2, P < 0.001; Full data: LRT = 327.8, P < 0.001). The overall fit of the 46 items to the GRM model was adequate for all subsets (Table 3). 
Table 3.
 
Overall Fit Indices of 46 Items to the GRM Model of Five Subsets
Table 3.
 
Overall Fit Indices of 46 Items to the GRM Model of Five Subsets
Differential Item Functioning Between Subsets With Full Data
No item was flagged for DIF in the subsets compared to the full dataset using the Likelihood-Ratio χ2 test at a level of 0.01 (Table 4). Subsequently, item parameter means were calculated out of five subsets to get robust estimates. Item discrimination coefficients ranged from 1.17 to 2.86, with CAT3 “Going out, such as seeing cinema films, theater plays or sports events” showing highest item discrimination and CAT29 “Suffering from glare” the lowest. Item thresholds parameters ranged from −1.45 to 4.11. The total test information was 156.55 of which 97.45% fell within the range −4 to 4. The S-X2 goodness-of-fit index of the items ranged from 14.25 to 60.1. No items showed a significant S-X2 value. Item information ranged from 2.11 to 5.05. Four items (CAT19 “Reading large text,” CAT28 “Suffering from tiredness of the eyes,” CAT29 “Suffering from glare” and CAT45 “Feeling sad”) contributed less than 2.55 (75% of the ideal item information [3.40] based on the total test information (156.55)). Nevertheless, we decided not to remove these items from the scale, given their locations of item difficulty on the disability continuum and their content. The person-item map shows that the items are distributed over almost the entire disability continuum. The scores of the respondents matched the difficulty of the items reasonably well, however, there are relatively few items at the ends of the continuum (Fig. 1). 
Table 4.
 
Mean GRM Item Parameters of the EyeQ With Standard Deviation, Item Information, and Fit Statistics of the Full Dataset
Table 4.
 
Mean GRM Item Parameters of the EyeQ With Standard Deviation, Item Information, and Fit Statistics of the Full Dataset
Figure 1.
 
Person–item map of the EyeQ item-bank. Respondents and items are calibrated along the same scale (latent trait). The histogram on the left represents the respondents. The histogram on the right represents the item location on the latent trait continuum. The Y-axis represents the theta range of the latent trait continuum where a higher theta represents a higher level of disability.
Figure 1.
 
Person–item map of the EyeQ item-bank. Respondents and items are calibrated along the same scale (latent trait). The histogram on the left represents the respondents. The histogram on the right represents the item location on the latent trait continuum. The Y-axis represents the theta range of the latent trait continuum where a higher theta represents a higher level of disability.
Differential Item Functioning
The following variables were dichotomized: nationality (Dutch vs. non-Dutch), visual acuity (<0.30 LogMAR versus ≥0.30 LogMAR), age (<75 years versus ≥75 years), diagnosis (nRMD versus other), civil status (living single versus not single), EQ5D score (perfect score versus other scores). A three-factor variable was created for number of nonocular comorbidities (no comorbidities vs. one comorbidity vs. two or more comorbidities) to obtain enough cases in response categories. DIF analysis for nationality was performed for only 27 items as after response category conjunction and dichotomizing this variable the minimum of five cases in each category could not be reached. 
Table 5 and Figures 2 and 3 show the results of the DIF analyses. The items CAT9 “Looking after appearance” showed uniform DIF for gender, and CAT42 “Feeling embarrassed” showed uniform DIF for age. CAT35 “Feeling worried or concerned about your safety at home” showed nonuniform DIF for independently versus with help (proxy). For the item “Looking after appearance,” the threshold parameters for females were lower than the thresholds for males, indicating that females endorse higher response categories at the same level of Vr-QoL score. For the item “Feeling embarrassed” the threshold parameter was lower for ≤75 years of age than >75 years of age, indicating that younger patients endorse higher response categories at the same level of Vr-QoL score. Furthermore, item “Feeling worried or concerned about your safety at home” showed a lower threshold for independently completed questionnaires, indicating patients filling out the Vr-QoL questionnaire independently, endorse higher response categories at the same level of Vr-QoL score. This effect was nonproportional across the levels of the trait: the difference between independent versus with help (proxy) widened with higher levels of theta. 
Table 5.
 
McFadden's Pseudo R2 and IRT Parameters for Items Displaying DIF (Likelihood-Ratio χ2 Test Criterion of 0.01)
Table 5.
 
McFadden's Pseudo R2 and IRT Parameters for Items Displaying DIF (Likelihood-Ratio χ2 Test Criterion of 0.01)
Figure 2.
 
Impact of DIF on the test characteristic curve (TCC) for the EyeQ. The plots on the left show the impact of differential time functioning (DIF) considering all items, the plots on the right show the DIF impact considering only items displaying DIF.
Figure 2.
 
Impact of DIF on the test characteristic curve (TCC) for the EyeQ. The plots on the left show the impact of differential time functioning (DIF) considering all items, the plots on the right show the DIF impact considering only items displaying DIF.
Figure 3.
 
Category response curves for items displaying DIF.
Figure 3.
 
Category response curves for items displaying DIF.
Discussion
This study describes the development of the EyeQ item-bank and its calibration. In addition DIF was investigated for several subgroups. The EyeQ is a PROM which aims to measure Vr-QoL in patients having exudative retinal diseases. The content of the EyeQ is based on three instruments measuring Vr-QoL to provide an extensive range of items covering the whole disability trait, which is preferable for a CAT. This new EyeQ item-bank now also covers domains that were reported as under-represented in previous qualitative research on the content validity of the Dutch-IVI.20 Despite the fact that the EyeQ was developed mainly based on items originating from other instruments, we consider the content of this instrument to be new, as various questions have been rewritten to be applicable to the response categories. 
A slightly high inter-item correlation was found between CAT33 “driving a car during the night” and CAT34 “driving a car under difficult circumstances” (0.77), with residuals of 0.33. Although this is not a severe threat this could influence the IRT parameter estimates and could pose a problem in the construction of the scale.29 Even when the instrument is implemented as a CAT, the estimation of the level of disability will be inaccurate. In addition, the high percentage of missing values of CAT33 warranted removal of this item. 
The final EyeQ contains 46 items with difficulties across the disability trait, however there are more items that are targeted for patients with a higher level of disability. In our study we included patients with a relatively low level of disability; however, the instrument is also likely to be suitable for patients with a higher level of disability as at the higher levels of theta, the EyeQ contains several items applicable without DIF for visual acuity. On the other hand, possibly because of the relatively low level of disability of participants, it was decided to collapse response categories “often” and “always,” which could lead to a decrease of sensitivity to detect changes in Vr-QoL scores of patients having high levels of disability. 
A strength of this research is the relatively large study sample that made it possible to divide the data in subsets and perform separate calibrations, a novel approach in ophthalmology leading to robust item estimates. After DIF analyses between these subsets and the full data showed no significant differences in item performance because of a possible selection bias, it was possible to calculate mean item estimates. This supports our expectation that the items included in the final EyeQ are robust and stable and therefore suitable for measuring Vr-QoL in patients with exudative retinal diseases. 
A limitation of this study is the relatively small group of patients with diabetic macular edema (8%), which may limit the generalizability of the results for this particular group. Response rates of patients with diabetic macular edema, cystoid macular edema, and age-related macular edema were, respectively, 13%, 24%, and 22%. A possible explanation is the presence of other diabetes-related health concerns, which may carry a heavier burden than the visual impairment and may have discouraged participation in this study. 
We performed DIF analyses for a series of variables; however, we only found three items that showed DIF, which all had a negligible impact on the total score. The impact of DIF when administering the EyeQ as a CAT could be higher, because the algorithm selects items from the item-bank until a specified level of precision is reached or a predetermined number of items is answered, which could possibly be the items displaying DIF. Future CAT simulations will show to what extent the items displaying DIF are actually administered in a CAT, in order to better estimate the impact of DIF on the total scores. Until then, we recommend using the group-specific item parameter estimates for these items in the algorithm. In addition, our future research will involve post hoc CAT simulations to assess how well the EyeQ item-bank performs as a CAT under different administration conditions. In this step, the theta estimated by CAT is compared with the true theta estimated by the full set of items. 
This calibration and assessment of DIF of the EyeQ is a step forward in the development of a new item-bank. However, in future research several psychometric properties need to be investigated (e.g., concurrent and discriminant validity, and test-retest reliability, as well as the responsiveness to detect changes over time).44,45 
Even though the IVI-CAT and the DR/DME QoL item-banks have been developed recently as well,17,18 we assume that the EyeQ is a valuable addition. First, because the IVI-CAT is based on the 28-item IVI, the EyeQ potentially is more comprehensive as we added items after searching other Vr-QoL instruments; in addition, we evaluated the relevance and comprehensibility of the IVI using three-step test-interviewing.20 Second, the DR/DME QoL item-bank requires still items to fill in case all 10 item-banks (for several domains) are administered as CAT; however, the possibility exists not to evaluate all domains. The EyeQ could provide us the best of both worlds: an in-between version with an all-around performance; a large distribution of items across the disability spectrum, while still measuring an unidimensional construct. The results from this study are promising for the use of the EyeQ as a generic instrument of Vr-QoL in patients with several retinal diseases. However, future research will involve longitudinal measurement invariance of the EyeQ and post hoc CAT simulations. The results of these analyses are important for use of the EyeQ in clinical practice. 
Conclusion
In conclusion, this study described the development and calibration of the EyeQ item-bank, which is a new instrument that can be used for the periodical and systematic assessment of Vr-QoL in patients suffering from exudative retinal diseases and receiving treatment in ophthalmic clinical practice. The model and item fit statistics of the EyeQ were found to be satisfactory, and robust item estimates could be estimated for 46 items because of separate calibrations. The calibration of the EyeQ allows use of this instrument in clinical practice. Future research should focus on the impact of DIF items on test scores while administered as a CAT and evaluate the longitudinal measurement invariance of the instrument. 
Acknowledgments
Supported by Bayer Healthcare Mijdrecht (Fellowship name/number: IMP20112/VUmc PROM). The sponsor had no role in the design and conduct of the study, the data collection, data analysis, data interpretation, or writing of the report. 
Disclosure: T.P. Rausch-Koster, None; M.A.J. Luijten, None; F.D. Verbraak, None; G.H.M.B. van Rens, None; R.M.A. van Nispen, None 
References
Delcourt C, Le Goff M, Von Hanno T, et al. The decreasing prevalence of nonrefractive visual impairment in older Europeans: a meta-analysis of published and unpublished data. Ophthalmology. 2018; 125: 1149–1159. [CrossRef] [PubMed]
GBD 2019 Blindness and Vision Impairment Collaborators; Vision Loss Expert Group of the Global Burden of Disease Study. Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Glob Health. 2021; 9(2): e144–e160. [CrossRef] [PubMed]
Verbraak FD, Ponsioen DL, Tigchelaar‐Besling OA, et al. Real-world treatment outcomes of neovascular age-related macular degeneration in the Netherlands. Acta Ophthalmol. 2021; 99(6): e884–e892. [CrossRef] [PubMed]
Brody BL, Gamst AC, Williams RA, et al. Depression, visual acuity, comorbidity, and disability associated with age-related macular degeneration. Ophthalmology. 2001; 108: 1893–1900; discussion 1900-1901. [CrossRef] [PubMed]
Bookwala J, Lawson B. Poor vision, functioning, and depressive symptoms: a test of the activity restriction model. Gerontologist. 2011; 51: 798–808. [CrossRef] [PubMed]
Hayman KJ, Kerse NM, La Grow SJ, Wouldes T, Robertson MC, Campbell AJ. Depression in older people: visual impairment and subjective ratings of health. Optom Vis Sci. 2007; 84: 1024–1030. [CrossRef] [PubMed]
van Nispen RM, de Boer MR, Hoeijmakers JG, Ringens PJ, van Rens GH. Co-morbidity and visual acuity are risk factors for health-related quality of life decline: five-month follow-up EQ-5D data of visually impaired older patients. Health Qual Life Outcomes. 2009; 7: 18. [CrossRef] [PubMed]
Greenhalgh J, Gooding K, Gibbons E, et al. How do patient reported outcome measures (PROMs) support clinician-patient communication and patient care? A realist synthesis. J Patient Rep Outcomes. 2018; 2(1): 1–28. [CrossRef]
van Nispen RMA, Virgili G, Hoeben M, et al. Low vision rehabilitation for better quality of life in visually impaired adults. Cochrane Database Syst Rev. 2020; 1(1): CD006543. [PubMed]
Pesudovs K . Item banking: a generational change in patient-reported outcome measurement. Optom Vis Sci. 2010; 87: 285–293. [CrossRef] [PubMed]
Braithwaite T, Calvert M, Gray A, Pesudovs K, Denniston AK. The use of patient-reported outcome research in modern ophthalmology: impact on clinical trials and routine clinical practice. Patient Relat Outcome Meas. 2019; 10: 9–24. [CrossRef] [PubMed]
Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, de Beurs E. Development of a computer adaptive test for depression based on the Dutch-Flemish version of the PROMIS item bank. Eval Health Prof. 2017; 40: 79–105. [CrossRef] [PubMed]
You DS, Cook KF, Domingue BW, et al. Customizing CAT administration of the PROMIS misuse of prescription pain medication item bank for patients with chronic pain. Pain Med. 2021; 22: 1669–1675. [CrossRef] [PubMed]
Patel RN, Esparza VG, Lai JS, et al. Comparison of PROMIS computerized adaptive testing versus fixed short forms in juvenile myositis [published online ahead of print July 30, 2021]. Arthritis Care Res. 2021, https://doi.org/10.1002/acr.24760.
Magis D, Yan D, von Davier AA. Computerized Adaptive and Multistage Testing with R. Cham: Springer; 2017.
Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ. Computerized Adaptive Testing: A primer. New York: Routledge; 2000.
Fenwick EK, Khadka J, Pesudovs K, Rees G, Wong TY, Lamoureux EL. Diabetic retinopathy and macular edema quality-of-life item banks: development and initial evaluation using computerized adaptive testing. Invest Ophthalmol Vis Sci. 2017; 58: 6379–6387. [CrossRef] [PubMed]
Fenwick EK, Barnard J, Gan A, et al. Computerized adaptive tests: efficient and precise assessment of the patient-centered impact of diabetic retinopathy. Transl Vis Sci Technol. 2020; 9(7): 3. [CrossRef] [PubMed]
Fenwick EK, Loe BS, Khadka J, Man RE, Rees G, Lamoureux EL. Optimizing measurement of vision-related quality of life: a computerized adaptive test for the impact of vision impairment questionnaire (IVI-CAT). Qual Life Res. 2020; 29: 765–774. [CrossRef] [PubMed]
Rausch-Koster TP, van der Ham AJ, Terwee CB, Verbraak FD, van Rens GH, van Nispen RM. Translation and content validity of the Dutch Impact of Vision Impairment questionnaire assessed by Three-Step Test-Interviewing. J Patient Rep Outcomes. 2021; 5(1): 1. [CrossRef] [PubMed]
Wolffsohn JS, Cochrane AL. Design of the low vision quality-of-life questionnaire (LVQOL) and measuring the outcome of low-vision rehabilitation. Am J Ophthalmol. 2000; 130: 793–802. [CrossRef] [PubMed]
Mangione CM, Lee PP, Gutierrez PR, et al. Development of the 25-item National Eye Institute Visual Function Questionnaire. Arch Ophthalmol. 2001; 119: 1050–1058. [CrossRef] [PubMed]
van Nispen RM, Knol DL, Langelaan M, van Rens GH. Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT) and differential item functioning (DIF) analyses. BMC Med Res Methodol. 2011; 11: 125. [CrossRef] [PubMed]
EuroQol Group. EuroQol—a new facility for the measurement of health-related quality of life. Health Policy. 1990; 16: 199–208. [CrossRef] [PubMed]
Lamers LM, Stalmeier PF, McDonnell J, Krabbe PF, van Busschbach J. [Measuring the quality of life in economic evaluations: the Dutch EQ-5D tariff]. Ned Tijdschr Geneeskd. 2005; 149: 1574–1578. [PubMed]
Corp. I. IBM SPSS Statistics for Windows 26.0 .
Rizopoulos D . ltm: An R package for latent variable modeling and item response theory analyses. J Stat Software. 2006; 17.
Rodriguez A, Reise SP, Haviland MG. Applying bifactor statistical indices in the evaluation of psychological measures. J Pers Assess. 2016; 98: 223–237. [CrossRef] [PubMed]
Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007; 45(5 Suppl 1): S22–S31. [PubMed]
Courtney M . Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assess Res Eval. 2013; 18: 1–14.
Edelen MO, Reeve BB. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res. 2007; 16(Suppl 1): 5–18. [PubMed]
Liu Y, Maydeu-Olivares A. Local dependence diagnostics in IRT modeling of binary data. Educ Psychol Measure. 2012; 73: 254–274. [CrossRef]
Nguyen TH, Han HR, Kim MT, Chan KS. An introduction to item response theory for patient-reported outcome measurement. Patient. 2014; 7(1): 23–35. [CrossRef] [PubMed]
Sijtsma K, Meijer RR, van der Ark LA. Mokken scale analysis as time goes by: An update for scaling practitioners. Pers Individual Diff. 2011; 50(1): 31–37. [CrossRef]
Pilkonis PA, Choi SW, Reise SP, et al. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS): depression, anxiety, and anger. Assessment. 2011; 18: 263–283. [CrossRef] [PubMed]
Loevinger J . The technic of homogeneous tests compared with some aspects of scale analysis and factor analysis. Psychol Bull. 1948; 45: 507–529. [CrossRef] [PubMed]
Magis D, Yan D, von Davier AA. Computerized Adaptive and Multistage Testing with R: using packages catR and mstR. Cham: Springer International Publishing; 2017.
Jones RN . Differential item functioning and its relevance to epidemiology. Curr Epidemiol Rep. 2019; 6: 174–183. [CrossRef] [PubMed]
Choi SW, Gibbons LE, Crane PK, lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. 2011; 39(8): 1–30. [CrossRef] [PubMed]
Orlando M, Thissen D. Further Investigation of the Performance of S - X2: An item fit index for use with dichotomous item response theory models. Appl Psychol Measure. 2003; 27: 289–298. [CrossRef]
Kang T, Chen TT. Performance of the Generalized S-X² Item Fit Index for Polytomous IRT Models. J Educ Measure. 2008; 45(4): 391–406. [CrossRef]
Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Measure. 2000; 24: 50–64. [CrossRef]
Hu Lt, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling. 1999; 6: 1–55. [CrossRef]
Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007; 45(5): S22–S31. [CrossRef] [PubMed]
Pellicciari L, Chiarotto A, Giusti E, Crins MH, Roorda LD, Terwee CB. Psychometric properties of the patient-reported outcomes measurement information system scale v1.2: global health (PROMIS-GH) in a Dutch general population. Health Qual Life Outcomes. 2021; 19: 226. [CrossRef] [PubMed]
Figure 1.
 
Person–item map of the EyeQ item-bank. Respondents and items are calibrated along the same scale (latent trait). The histogram on the left represents the respondents. The histogram on the right represents the item location on the latent trait continuum. The Y-axis represents the theta range of the latent trait continuum where a higher theta represents a higher level of disability.
Figure 1.
 
Person–item map of the EyeQ item-bank. Respondents and items are calibrated along the same scale (latent trait). The histogram on the left represents the respondents. The histogram on the right represents the item location on the latent trait continuum. The Y-axis represents the theta range of the latent trait continuum where a higher theta represents a higher level of disability.
Figure 2.
 
Impact of DIF on the test characteristic curve (TCC) for the EyeQ. The plots on the left show the impact of differential time functioning (DIF) considering all items, the plots on the right show the DIF impact considering only items displaying DIF.
Figure 2.
 
Impact of DIF on the test characteristic curve (TCC) for the EyeQ. The plots on the left show the impact of differential time functioning (DIF) considering all items, the plots on the right show the DIF impact considering only items displaying DIF.
Figure 3.
 
Category response curves for items displaying DIF.
Figure 3.
 
Category response curves for items displaying DIF.
Table 1.
 
Sociodemographic and Clinical Characteristics of Participants (n = 704)
Table 1.
 
Sociodemographic and Clinical Characteristics of Participants (n = 704)
Table 2.
 
Distribution of Responses Over the Response Categories of the EyeQ Item-Bank
Table 2.
 
Distribution of Responses Over the Response Categories of the EyeQ Item-Bank
Table 3.
 
Overall Fit Indices of 46 Items to the GRM Model of Five Subsets
Table 3.
 
Overall Fit Indices of 46 Items to the GRM Model of Five Subsets
Table 4.
 
Mean GRM Item Parameters of the EyeQ With Standard Deviation, Item Information, and Fit Statistics of the Full Dataset
Table 4.
 
Mean GRM Item Parameters of the EyeQ With Standard Deviation, Item Information, and Fit Statistics of the Full Dataset
Table 5.
 
McFadden's Pseudo R2 and IRT Parameters for Items Displaying DIF (Likelihood-Ratio χ2 Test Criterion of 0.01)
Table 5.
 
McFadden's Pseudo R2 and IRT Parameters for Items Displaying DIF (Likelihood-Ratio χ2 Test Criterion of 0.01)
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×