Censoring of low sensitivity values at various levels (<8–20 dB) reduced both within- and between-visit variability in SAP testing for RP patients, with a tendency for a greater reduction in variability when censoring a larger number of thresholds within this range. However, a major consequence of censoring is that substantially fewer data points, eyes, and subjects were included as the censoring levels increased, which was a greater concern for RP patients with vision 20/60 or worse, as these patients tend to have lower sensitivity values for many of the test points than patients with better vision (20/20–20/70). The data in
Figure 1 and
Table 2 suggest that censoring at levels higher than 8 to 13 dB can further reduce variability but at the expense of a reduction in the number of test locations contributing to the result. For the cohort 1 data up to 13 dB, the percent reduction in variability compared to the uncensored value exceeded the percent reduction in eyes, but that balance reversed beyond this level, so for this particular sample it would not be advisable to censor beyond 13 dB, and <9 dB may be a more prudent, recommended level given the variability of 9 dB demonstrated in our subgroup analysis. For the data from cohort 2, even censoring at <8 dB came at the expense of a greater percent drop in contributing eyes than the percent reduction in variability. For this particular sample, censoring at values less than <8 dB would not make sense, as any lower values fall within a range of 95% CR from 0 dB and are thus affected by a floor effect; therefore, censoring in cohort 2 would not be beneficial as it was in cohort 1 due to the much greater loss of test locations and much less reduction in variability for censoring <8 dB in cohort 2.
Censoring low sensitivity values with high variability can simplify the analysis and interpretation process, as longitudinal changes can be assessed for global measures, such as the mean sensitivity, for all included points above a threshold value rather than having to consider the sensitivity and variability at individual test locations when censoring is not applied. Another alternative to censoring might be to utilize 95% CRs for different baseline sensitivity ranges, as in our subgroup analysis in
Figure 2, in order to determine whether a given test location exceeds the variability threshold and thus shows significant improvement or worsening. We prefer to advocate for censoring <9 dB values or considering the variability of individual test locations rather than using the average of all (uncensored) test locations, as the profile for the hill of vision (i.e., steep with very few locations with low sensitivity or flat with many areas of low sensitivity) will vary across RP patients, and the average of all test locations may not reflect how many are in the low sensitivity range that contributes to greater test variability.
For both RP cohorts included in the current analysis, the improvement in variability when censoring at higher values was accompanied by a loss of subjects and eyes; thus, it would not generally be advisable to censor at levels greater than 9 to 10 dB for most RP patients. When we censored values < 10 dB, only one eye was excluded from cohort 1, and about a third of cohort 2 eyes were excluded when compared to the original uncensored dataset for all of the clinical trial participants. In contrast, when we censored values < 18 to 19 dB, a much larger proportion of eyes was excluded, such that a fifth of the subjects in cohort 1 and half of the subjects in cohort 2 were no longer included, as all of their test results were within the censored range. If treatment effects or visual changes occur in the range of 10 to 20 dB, censoring at levels within this range would be problematic because improvements or losses in vision would be missed; thus, we recommend restricting censoring to <10 dB for an ∼4-dB improvement in between-visit SAP variability (95% CR) with size III stimuli when compared to no censoring in people with RP who have visual acuity better than 20/70. For those RP patients, a general rule of thumb for clinicians monitoring for longitudinal changes might be to disregard points with sensitivities that drop below 9 dB and to consider any change that exceeds the typical variability of ∼9 dB for the remaining points to be a real, meaningful change. We recommend a more judicious consideration or use of censoring, which may not be helpful at even <8 dB for RP patients with vision 20/60 or worse who are tested with the large size V stimulus due to exclusion of substantial amounts of data for only a modest improvement in test variability. Clinical trials may consider our recommendations for censoring but should have the flexibility to choose to modify or tailor our suggestions according to their needs (e.g., for therapies with anticipated effects within a specific range of retinal sensitivities) or if the trial participants may vary from our study cohort. Given the relatively small sample size in cohort 2 and the subgroup analysis for cohort 1, there is a risk that our findings may not generalize to other cohorts with RP if our subjects are significantly different than RP patients seen clinically or who would be enrolled in future clinical trials.
Future clinical trials for RP may need to define criteria for censoring a priori and assess the effects of censoring after baseline testing but prior to intervention to determine if there are excluded individuals with severe vision loss after censoring, in which case other participants may have to be enrolled or investigators may need to consider using other approaches. When monitoring changes in areas with mild to moderate loss of sensitivity, the approach of censoring test points with low sensitivities is ideal; however, censoring would not be appropriate when attempting to detect changes at the edge of a patient’s peripheral field of vision due to the typical hill of vision usually noted in RP. In such cases, kinetic perimetry would be more valuable to reliably measure changes in viable peripheral retinal area.
14 If using SAP to evaluate changes in very low sensitivity values, it could be valuable to utilize sensitivity-dependent criteria when monitoring for natural progressive vision loss or improvements during clinical trials when a new treatment may target damaged photoreceptors with sensitivities in the range of 0 to 10 dB. If the intention is to monitor individual SAP test locations with values between 0 and 9.5 dB at the initial baseline test, we recommend using a size V stimulus and a rule of thumb for gauging longitudinal improvements in sensitivity that subsequent longitudinal changes ≥ 8 dB in RP patients are likely to be real and meaningful, as they exceed typical test–retest variability. However, only test locations with initial values of 9 dB can be reliably assessed longitudinally for true loss of sensitivity, as subsequent longitudinal changes greater than or equal to –9 dB could indicate either a change in the extremely low sensitivity values of 0 or 1 dB or a total loss of sensitivity < 0 dB. Test locations with initial values of 0 to 8 dB cannot be reliably assessed longitudinally for loss of sensitivity, as the typical test–retest variability exceeds the value itself.
A comparison of the between-visit variability for the two study cohorts revealed that the uncensored test variability was much lower (by ∼3.5 dB) for cohort 2 subjects who had visual acuity 20/60 or worse and completed the testing with the size V stimulus than the subjects in the other cohort with better vision who completed the testing with the size III stimulus. For censoring between <8 and <17 dB, the between-visit variabilities between cohorts were more similar, with differences between cohorts ranging from 0.3 to 1.5 dB; cohort 2, with worse vision and larger stimulus size, had slightly less variability than cohort 1 across all censoring levels. Our finding for the difference in uncensored, between-visit variability when comparing the cohorts was somewhat unexpected because previous research in glaucoma patients found that the greatest factor for increased test–retest variability was increasing scotoma depth
15 or decreased visual field sensitivity, which explained over half of the total variability.
16
For our subgroup analysis in cohort 1, test variability was greater for threshold values of 2 to 5.5 dB than for values of 6 to 9.5 dB. That factor related to test variability, however, was not likely responsible for the difference in uncensored variability between cohorts because both cohorts had approximately the same proportion of test locations with values between 2 and 5.5 dB at the first test (i.e., 2.6% for cohort 1 and 2.0% for cohort 2), as well as about the same proportion of test locations with values between 0 and 9.5 dB at the first test (i.e., 7.2% for cohort 1 and 8.7% for cohort 2). Subjects in cohort 1 had much greater variability on average for the test locations with initial values from 0 to 9.5 dB than cohort 2 subjects with initial values from 0 to 9.5 dB. Thus, we hypothesize that the difference in uncensored variability between our RP cohorts was most likely attributable to the difference in the test target size (i.e., retinal areas with low sensitivity, 0–9.5 dB, are more reliably evaluated with the larger size V stimulus), as previous research has demonstrated less test variability with the larger stimulus size V compared to size III in glaucoma patients.
15,17 The larger stimulus implemented in the study of cohort 2 subjects has another potential advantage of shifting patients’ sensitivity to higher levels, which can allow for more test locations and/or eyes to be included during censoring, in addition to increasing the dynamic range to assess change and providing the potential for less variability at greater sensitivity values.
18
In cohort 2, the differences between within-visit and between-visit variability were quite minimal, with differences ranging from 0.02 to 0.62 dB across the censoring levels of <8 to <17 dB, and slightly greater variability within visits than between visits. A previous study in patients with severe vision loss due to ocular diseases such as RP, macular disease, optic nerve disease, or diabetic retinopathy, found that between-visit variability for the Humphrey visual field test in scotopic test conditions was greater than or similar to within-session variability,
19 a finding similar to our current findings for censored and uncensored data.
Longitudinal studies involving improvements or loss of sensitivity on SAP in RP are needed to explore the recommendations proposed here for censoring or using sensitivity-specific criteria for monitoring change. In patients with mild to moderate glaucoma, censoring SAP values <20 dB had relatively little impact on the ability to detect progression rates.
20 Future studies using a longitudinal dataset should confirm whether censoring values of <8 dB has any substantial impact on the ability to detect the progression of RP. Finally, for censoring to be implemented in future clinical trials or in clinical practice, it will be imperative to develop software to automate the calculations for each censoring level and set an optimization criterion to make the process more efficient.