An ideal metric for monitoring glaucoma would be accurate (directly representative of the true disease status), precise (taking one of many different possible values rather than just a small number of possibilities), and repeatable (low test–retest variability). Unfortunately, it is currently impossible to assess test accuracy, since there is no gold standard available for describing the true disease status or its rate of change in any given eye. An assay of the exact number of remaining retinal ganglion cells may one day provide such a gold standard, but even that may not be sufficient if an unknown number of those cells are present but dysfunctional. Precision is easier to assess. For example, frequency-doubling technology (FDT) perimetry using the Matrix perimeter (Carl Zeiss Meditec Inc.) has only 15 possible final threshold estimates available to describe the functional status at any given visual field location, whereas OCT reports RNFLT on a continuous scale.
Repeatability has most frequently been assessed using test–retest variability,
7,12,15,16 with as short a period of time as possible between the repeated testing in order to minimize the possibility of true change having occurred during that time period. In this study, repeatability is assessed using the residuals from the trend over time, which is essentially the same as assessing test–retest variability using tests that are up to several years apart, but accounting for possible change that may have taken place over that period. This enables a larger dataset to be used without requiring impractical amounts of short-term data collection, improving estimates of the amount of variability present. The estimates of noise derived here are similar to those from the literature using short duration test–retest studies.
12,13
Perhaps a larger advantage of using longitudinal data, rather than short duration test–retest data, is that it allows the amount of change over time to be used as a measure of signal. A test whose outcome displays very little change over time or has a narrow dynamic range is of limited use for a progressive neuropathy such as glaucoma, even if the variability is low. As an extreme example, consider multiplying all results from a device by zero; there would be zero variability, but there would also be no remaining signal. Detection of change, and perhaps as importantly the rate of change, is one of the two main priorities for a clinical test used to monitor a chronic disease such as glaucoma. The other priority is its ability to discriminate between healthy and diseased eyes. The cross-sectional technique proposed by Artes et al.
8 can be used to assess signal-to-noise ratios for the ability to discriminate healthy from diseased eyes, but the technique demonstrated in the current study may provide a better framework within which to assess signal-to-noise ratios for detection of progression.
Choices made in this study may have had a minor effect on the results, even though such effects were minimized wherever possible. Firstly, RNFLT was assessed from OCT scans after manual refinement of the instrument's automated segmentations, to remove obvious errors and inconsistencies. Such refinements might not be routinely performed in the clinical setting. Secondly, linear fits over time were used for both methodologies, despite the fact that SAP sensitivities on a decibel scale have been reported to decline exponentially whereas RNFLT declines linearly,
11,17–19 and so the residuals for MD may be larger (giving a larger estimate of ‘noise') than they would be with a different longitudinal model. The linear fit was retained principally because it provides a more intuitive measure of signal, since in an exponential longitudinal model the rate of change is not constant. A linear model appears to be adequate for examining changes in MD over time so long as the series length is not excessive.
20 Thirdly, it is known that variability increases with damage for SAP,
12,14,21 and so the resultant estimate of noise may be an overestimate compared with the true amount of noise that is present at the earliest stages of the disease. Fourthly, unreliable test results were excluded from the analysis. Eight and half percent of SAP visual fields were excluded based on excessive false positives, false negatives, or fixation losses, whereas only a single OCT scan was excluded based on an unacceptable quality score. This, together with potential learning effects (which should not be present in this dataset due to the subjects' prior experience with SAP), could make the signal-to-noise ratio for SAP worse in practice than reported here. Finally, the cohort consisted primarily of subjects with relatively early disease, and it has been suggested that structural testing can detect damage sooner than functional testing. It is possible that the signal-to-noise ratio for SAP could change later in the disease process.
It is perhaps surprising that the signal-to-noise advantage of OCT RNFLT over SAP MD is not greater. OCT is an objective test, in as much as it does not rely on inherently variable psychophysical responses, which are probabilistic due to the nature of the psychometric function. It has also been suggested that RNFLT from OCT may provide a better metric for change than SAP MD in early stages of the disease,
22 which is the dominant severity in this cohort, as evidenced by the average MD of −0.72 dB. It can be concluded that the variability in OCT RNFLT is still considerable, and further advances are needed in both image quality and the segmentation of anatomical layers before its full potential is realized.
It is imperative when comparing different testing modalities, not only using different units but measuring completely different targets, to have a common framework within which they may be compared. Signal-to-noise ratios may provide such an ideal conceptual framework, and could usefully be reported by studies of methods for assessing the rate of change in glaucoma. In the current study, based on estimated rates of change, RNFLT as measured by OCT had a better longitudinal signal-to-noise ratio than MD from SAP.