Clinicians and researchers working with individuals with LHON should be aware of the limitations of the VF-14 in its original format. When used by individuals with LHON, the original version of the VF-14 exhibits several shortcomings that impact its psychometric validity. Key issues identified in this study included problems with the structure of response categories and evidence of scale multi-dimensionality. Respondents with LHON in this study tended to endorse the categories “unable to do” and “a great deal of difficulty” for most items in the VF-14, resulting in left-skew of the distribution of person scores and a minor floor effect, i.e. a lower limit to the data values that the VF-14 can reliably specify. The response category “a little difficulty” was seemingly underused across almost all items, with respondents being always more likely to respond in one of the adjacent response categories. These problems likely arise from differences in the visual abilities of individuals with LHON compared to those with cataracts, for whom the VF-14 was originally designed for. The effect of cataracts on vision is widely variable, as cataracts may be unilateral or bilateral and can vary widely in size, morphology, and degree of lens opacification. Whereas vision loss in LHON is generally bilateral and severe, with the majority of patients with LHON significantly visually impaired with visual acuity of 6/60 or less.
18,19
In its original format, the VF-14 also demonstrated evidence of multidimensionality. Measurement scales that are multidimensional are problematic as they indicate that different items are measuring different traits, thereby reducing the reliability of the scale.
16 Apparent multidimensionality can also be indicated where local dependency is present within a scale.
20 We found that by removing the “participation in sports” item and both driving items, along with accounting for the dependency between the “reading small print” and “reading newspaper/book” items, the VF-14R became a unidimensional scale. The “participation in sports” item showed misfit to the Rasch model and an underdiscrimination for respondents with differing levels of visual ability, indicating that it was performing poorly as a measure of vision-related activity limitation. This is not surprising as the amount of vision required for sports participation depends on the type of sport, of which the VF-14 lists “bowling, handball, tennis, and golf” as examples. In contrast, driving is very dependent on visual ability. However, assessing difficulty driving in a vision-related PROM with multiple response categories is problematic, as responses will typically favor those with good vision that meet the legal requirement for driving and, hence, responses in the “no difficulty” and “a little difficulty” categories.
The performance of the VF-14 was also impacted by incomplete responses marked with “not applicable.” Nearly one quarter or more of respondents did not respond to at least one of five items (“participation in sports,” “playing games,” “doing fine handwork,” “driving during the day,” and “driving at night”). Differences in the demographic background of individuals with LHON compared to individuals with cataracts, likely explain the incomplete responses. The peak age of onset of vision loss in LHON is between 15 and 35 years, with a subset of patients under the age of 12 years developing a form of childhood-onset LHON.
21 The examples used for sports (“bowling, handball, tennis, and golf”), games (“bingo, dominos, card games, and mahjong”), and fine handwork (“sewing, knitting, crocheting, and carpentry”) are more relevant for older individuals with cataracts compared to younger individuals with LHON. Missing responses, particularly for the driving items, lead to larger standard errors (
Table 2) and reduced precision of estimates. Incomplete responses for the driving items may be related to two factors. First, individuals with LHON, especially those with childhood-onset LHON, may have never had the opportunity to learn to drive because of legal requirements regarding age. Second, respondents in this study were recruited from European countries including Germany, the Netherlands, and the United Kingdom, where there is more widespread public transportation use compared to the United States, where the VF-14 was originally developed. Although both driving items were excluded from the VF-14R, they can still be retained as nonscoring items because they provide useful clinical information that serve as a benchmark for the level of visual function legally required to drive.
For PROMS to be effective as an outcome measure in clinical trials, they have to capture the disease characteristics that matter to the patient and must be reliable and valid.
4 The VF-14 was previously used in the RHODOS trial to measure change in health-related quality of life.
13 Only small changes in the VF-14 score were observed over the 24-week study period.
22 A small nonsignificant treatment effect was detected between the idebenone- and placebo-treated groups. As we have demonstrated in this study, there are issues with the validity of the VF-14 that limit its usefulness. Furthermore, the VF-14 only measures vision-related activity limitation and does not specifically address other domains of quality of life that may be important to individuals with LHON.
23,24 In this study we have demonstrated that the psychometric properties of the VF-14 can be improved by re-engineering the response categories and removing items that don’t fully contribute towards the total score. Additional studies could be performed to determine the impact of idebenone on vision-related activity limitation using the VF-14R we have proposed. However, post-hoc revisions to the VF-14 does not address the content of the PROM, which would require additional items to function as a measure of health-related quality of life.
23 For these reasons, we have not provided equating scales to convert the original VF-14 to the modified VF-14R score.
A weakness of the VF-14 in its original format is the conversion of raw scores for individual items into an average score out of 100. This assumes that each item contributes equally to the final score, and that the interval between each response category is uniform across all the categories and for all items. A key strength of this study was the use of Rasch analysis to overcome this limitation by calculating item difficulty in relation to person ability and adjusting the overall scores accordingly, allowing comparison of measures and interpretation of changes in scores when the logit scores are used. Previous studies have found that Rasch-informed revisions to the VF-14 are more sensitive to change than the original VF-14, when used in longitudinal studies of patients with cataracts who undergo surgery.
25 Another strength of Rasch analysis is its robustness to incomplete data, with all available data being used within the analysis process.
26 This is a substantial advantage over Classical Test Theory, a traditional psychometric approach, where item-level missing data can bias test results.
27
Limitations of this study relate to the study participants. Individuals with LHON were recruited from large university clinics in three European countries. Patients seen in these clinics may have more complex disease or severe disease because of referral bias. Except for one respondent of Asian descent, all respondents were white. The experiences of these respondents residing mainly in Europe may differ from those elsewhere in the world (e.g., in the United States).
28,29 Consistent with the epidemiology of LHON, over 70% of respondents were male, and nearly 70% harbored the m.11778G>A LHON mutation in this study. Additional studies would be required to determine whether any items of the VF-14 exhibited differential item functioning, particularly for female participants or those who carried other LHON mutations.
In summary, the VF-14 in its original format exhibits several limitations that undermines its psychometric validity as a PROM for assessing vision-related activity limitation in individuals with LHON. These limitations likely stem from differences in the clinical and demographic characteristics of individuals with LHON compared to those with cataracts, leading to problems with disordered response category thresholds, item misfit, and scale multidimensionality. Rasch informed post-hoc revisions of the VF-14 can improve the psychometric validity of the PROM. However, these revisions do not overcome issues relating to PROM content. Future studies should evaluate the sensitivity of other established scales for conditions that have similar clinical and demographic characteristics to individuals with LHON, such as people with low vision, or focus on the development of a LHON-specific PROM that captures the disease characteristics that matter most to LHON patients.