The main goal of this study was to create shortened versions of the 150-item ULV-VFQ for clinical use that will provide, across a wide range of severely reduced visual ability, the most accurate PMs for a given item count as well as a good representation of items governed by different visual aspects and pertaining to different functional domains. An additional restriction was the retention of all 17 items corresponding to a concurrently developed set of performance measures. To accomplish this we retained evenly spaced items from the 150-item questionnaire while also preserving aspects and domains as much as possible. From the data presented above it appears that both the 50- and 23-item versions of the instrument meet the desired criteria, albeit that the shorter instrument will yield higher standard errors in the PM estimates, that is, lower precision. The 17-item instrument, containing the items most suited to create a set of performance measures that can be administered in any setting, spans a shorter range, has less regular spacing, and large SEs especially at the end of the range, and for that reason is less well suited as an outcome measure for clinical studies.
From the fit statistics in
Figure 3, it is clear that only 5 items were underfitted in the 150-item ULV-VFQ; only 2 of these remained in the 50-item version, and none on the smaller versions. Thus, at most 4% of the items are underfitted in the test population of 80 individuals with ULV. Administering the full questionnaire to additional ULV individuals may slightly increase the number of underfitted items, as can be seen from the change in person fit statistics with change in item set size: Smaller item sets appear to yield better results, but this is primarily due to the reduced set size, and, thus, a lesser degree of oversampling in the middle of the ability/difficulty range. This was confirmed by removing the 5 misfitting items from the 150-item set, which did not appreciably change the person fit. Thus, the PM misfits in the 150-item questionnaire are caused by item redundancy in the middle of the range, rather than by outlying or misfitting items.
One may wonder whether our results would have been different if we had run the analyses with a different data set from the one that was used to calibrate of the 150-item ULV-VFQ, or of the items had been administered in a different order; a concern of item dependency was raised by one of the reviewers. We have no reason to suspect that this would be the case, as the 80 participants spanned a range in excess of the 150 items, and their data provided a consistent fit to the Rasch model. The concept of item dependency describes a situation in which not only the latent trait – in our case, functional reserve or visual ability – determines the ratings, but also extraneous factors, such as the order in which items are administered, or respondent fatigue or lack of interest. This is most often encountered as local item dependency, which expresses itself in unexpected relative IM shifts for items that should have similar IMs. In administering the ULV-VFQ we randomized the items and administered them in the same order to all respondents; thus, unless all respondents tired of the questions at the same rate, there is no reason to assume that item dependency had a role, and any such dependency would not be local. Thus, we feel confident that only minor adjustments of the item measures will be necessary as additional ULV individuals will contribute data in the future.
Respondents in our study were given an “opt-out” choice in addition to the 4 difficulty levels: If they felt that an item did not apply to them they could say so, and this answer was treated as missing data in the Rasch analysis. In the 150-item data set, the median number of items with this response was 2, and the mean 6.26, that is 4.2% of the number of items. With only 6 missing items on average, and even with the maximum number of 47 (31%) for one respondent in our population, the remaining redundancy was enough to get a precise PM. For the 50-, 23-, and 17-item versions a high number of missed items is obviously a concern. In our data sets the median numbers of missed items were all 0, and the mean numbers 1.3 (2.6%), 0.4 (1.7%), and 0.35 (2.1%); for the respondent with the highest opt-out rate, the numbers were 12 (24%), 6 (25%), and 5 (29%). Thus, if anything, our assessment of the reduced versions of the questionnaire was less affected by “opt-out” answers than the original 150-item analysis, and the missing item numbers were low enough not to have an appreciable effect on the resulting PMs for all but a few respondents. Thus, even with the “opt-out” choice available to respondents in a clinical setting, we expect few of them to give that answer to more than a few questions. If a high opt-out rate becomes a concern, the use of the adaptive version of the ULV-VFQ (Dagnelie G, et al. IOVS. 2015;56: ARVO E-Abstract 497) should be considered.
In constructing the 50- and 23-item version of the ULV-VFQ we were concerned that not all visual aspects and functional domains would be represented, in particular since we knew from our previous work
6 that certain visual aspects are differentially distributed along the visual demand axis. However, the finding that the data are essentially unidimensional greatly reduces that concern. Not only does it demonstrate that no meaningful subscales can be distinguished in the ULV-VFQ; it also shows that for the purpose of ability assessment, the choice among items governed by different visual aspects or pertaining to different functional domains is less important than the choice of items that evenly span the range of abilities to be assessed.
One may wonder if, rather than eliminating items on the basis of even spread, or visual aspect and domains, it would not be best to retain the most informative items. As shown in
Figure 2, the most informative items are those in the center of the range, but that also is where most of the items are. Thus, if the sparseness of the information is taken into account, retaining items towards the end of the scale is important, particularly since the PMs of outlying (very able or disabled) individuals can only be estimated by virtue of outlying items.
We did not present results of a differential person functioning (DPF) analysis here. The reason for this is simple: Whether we grouped our data according to the item sets, functional domains, or visual aspects, in all cases the DPF values were very small. This is in line with the high correlations presented above, and with the lack of an appreciable second dimension in the PCA.
One remaining question may be whether there is any need for a calibrated 150-item instrument, if the shortened versions are so similar in properties. In our opinion, the merit of such an instrument are two-fold: (1) to continue building a large set of activities in the ULV range, as a calibration standard for future instruments, or (2) to provide an item bank for other ULV questionnaires, whether they are adaptive and select items most appropriate for each individual respondent, or have fixed item sets that concentrate on specific types of activities, or concentrate on certain visual aspects or domains.
In conclusion, we have derived from the 150-item ULV-VFQ two shorter ULV questionnaires for clinical use, the ULVVFQ-50 and the ULVVFQ-23, with anchored items, and have studied their psychometric properties; a third version, with 17 items, spans a shorter range and, therefore, is less appropriate in a population with a wide range of ULV. Which of the remaining instruments is preferred in a given setting will depend on the willingness of the clinical investigators to spend more time administering the instrument, in exchange for greater precision of the PM estimates.