Abstract
Purpose: :
The photopic negative response (PhNR) may be useful as a tool to monitor longitudinal change in retinal ganglion cell (RGC) function. The goal was to assess PhNR test–retest reliability, and to estimate the amount of change between tests that is likely to be statistically significant for an individual test subject.
Methods: :
Photopic electroretinograms (ERGs) were recorded from 49 visually normal subjects (mean age, 38.9 years; range, 21–72 years). Signals were acquired using Dawson-Trick-Litzkow (DTL) electrodes in response to red stimulus at four flash energies (0.5, 1, 2.25, 3 cd·s/m2) on a blue background (10 cd/m2). The PhNR amplitude was recorded from prestimulus baseline to trough (BT), prestimulus baseline to fixed time point (BF), and b-wave peak to trough (PT). The ratio of baseline PhNR to b-wave amplitude (BT/b-wave) was calculated. Reliability was assessed using the intraclass correlation coefficient (ICC2,1) and coefficient of repeatability (CoR).
Results: :
Flash energy of 1.00 cd·s/m2 produced reliable, well-defined traces. At this stimulus, the a- and b-wave amplitudes were reproduced with moderate reliability (ICC, 0.62; CoR%, 90.0%; and ICC, 0.74; CoR%, 54.3%; respectively). For PhNR, the order from most to least reliable measurement was: PT (ICC, 0.64; CoR%, 59.1%), BT (ICC, 0.40; CoR%, 148.3%), and BF (ICC, 0.22; CoR%, 166.1%). The BT/b-wave did not improve reliability (ICC, 0.37; CoR%, 181.5).
Conclusion: :
The b-wave peak-to-PhNR trough amplitude produced the most reliable measurement.
Translational Relevance: :
A relatively large magnitude of change in PhNR amplitude is required to make clinical inferences about changes in RGC function. Refinement to the technique of acquisition and/or processing of the PhNR is recommended to improve reliability.
A total of 49 visually-normal subjects participated in this study (mean age, 38.9 years; range, 21–72 years). Electrophysiological tests as detailed below were performed on two occasions by the same operator (mean days between test–retest, 7.9 days; range, 6–20 days).
The research was conducted after ethics approval by the Royal Victorian Eye and Ear Hospital Ethics Committee and under the tenets of the Declaration of Helsinki. Informed consent was obtained from all subjects after explanation of the nature and possible consequences of the study.
Pupils were dilated to at least 7 mm using 1% tropicamide (Mydriacyl). Eyes were preadapted for at least 1 minute with background room light (of 0.92 cd/m2). An Espion system (E2/ColorDome; Diagnosys LLC, Lowell, MA, USA) was used for stimulus generation and data acquisition. Brief preadaptation to the blue background of approximately 1 minute was performed before the first stimulus.
Brief, red (peak wavelength, 635 nm) stimulus at four flash energies (0.5, 1, 2.25, and 3 cd·s/m
2) was delivered via a Ganzfeld sphere on a blue background (of 10 cd/m
2, peak wavelength 465 nm). Flashes were 4 ms in duration and presented at 1 Hz. The response was recorded using a Dawson-Trick-Litzkow (DTL) fiber electrode placed inside the lower lid conjunctival fornix of each eye.
16 The ground electrode was attached to the forehead and the reference electrode was attached to the lateral canthus.
The waveforms were averaged over 10 sweeps at each stimulus level and signals were filtered from 0.15 to 100 Hz. An automatic rejection system removed large artefacts secondary to blink and eye movements.
Although both eyes were tested, only the results from the right eyes were included in the statistical analysis to exclude any effects of statistical dependency between the eyes. As amplitudes of ERG waveforms are distributed commonly with a positive skew, nonparametric Wilcoxon matched-pair signed-rank for related samples was used to evaluate intersession changes.
Relative reliability was analyzed using the intraclass correlation coefficient (ICC), which is a measure of the proportion of the total variance that is due to the variability between individuals. The 2-way, random-effect model (ICC
2,1) was chosen to account for systematic and random error, and to enable generalization of the reliability data beyond the confines of this study.
20 According to Fleiss,
21 ICC values >0.75 represent “excellent reliability,” values between 0.4 and 0.75 represents “fair to good reliability,” and values <0.4 represents “poor reliability.
”
Absolute reliability was assessed using coefficient of repeatability (CoR), which provides an interval within which 95% of test–retest measurement differences lie.
22,23 This was calculated by ±1.96 multiplied by the standard deviation of the mean difference, and the 95% CI was constructed as described previously.
18,19 The CoR also was expressed as a percentage of the mean test–retest value (CoR%).
All statistical analyses were performed with SPSS (Released 2009. PASW Statistics for Windows, Version 18.0; SPSS, Inc., Chicago, IL, USA)
The present study demonstrated that the reliability of the PhNR varies depending on the method of amplitude measure. To date, although some studies have reported coefficient of variability as a measure of within-subject variation,
6,14,25 few have reported PhNR test–retest reliability.
Viswanathan et al.
7 found that on repeated recording in visually-normal subjects, baseline to PhNR trough measures was within ±13% of the mean amplitude; however, the study population was small (
n = 6). Mortlock et al.
18 reported much larger variation of ±88.4% of mean amplitude. We found the test–retest variation of baseline to PhNR trough amplitude to be even higher (within ±148.3%). Measuring the trough at a fixed time point, which may useful where the PhNR trough is not well-defined,
2,7 had similarly high CoR% of ±166.1%. The relatively large CoR% can be accounted for by the smaller mean values of these measurements compared to their absolute CoR value, resulting in a larger percentage.
18 However, taken together with the ICC values, amplitudes measured with reference to the baseline (BT and BF) are the least reliable (ICC, <0.4) and a relatively large magnitude of change may be required in repeated testing to be confident that the difference is significant.
We found the most reliable PhNR measure to be peak-to-trough (ICC, 0.64) where 95% of test–retest difference is expected to lie within ±59.1%. This finding is slightly higher than that of Mortlock et al.
18 (±42%,
n = 16). The underlying process responsible for the PhNR response is of small amplitude and most likely commences before the b-wave is complete. Measurement of the PhNR in this way is analogous to the method of measuring the b-wave from the trough of the a-wave. It would be expected that the reliability of measuring the PhNR in this way would be affected by the reliability of the peak of the b-wave itself. There is, however, no simple way to evaluate the significance of this. It should be noted that this study addressed the reliability of measurement of the PhNR and did not attempt to investigate which method of measurement is most sensitive or specific for detection of longitudinal change in an individual test subject. Further studies are required to assess the sensitivity and specificity of PT compared to other methods in RGC dysfunction.
It has been suggested that the PhNR/b-wave amplitude ratio would show less variability and might prove to be a more useful measure than absolute PhNR amplitude.
7,15,19 While Mortlock et al.
18 reported the ratio to improve reliability, they calculated ratio of b-wave to peak-to-trough, which again comprises mainly of the b-wave amplitude. In our study, the ratio of baseline PhNR and b-wave amplitude was poorly reliable (ICC, 0.37; CoR%, 181.5) and likely reflects the variability of baseline PhNR measurements. This finding is consistent with an earlier study that found the reproducibility of the ratio was no better than absolute amplitude.
14
A limitation of our study is that only 10 sweeps were averaged for each recording and we acknowledge that reliability may be improved by increasing the number of sweeps. Using more sweeps, however, would assume stationarity of the process and the absence, for example, of adaptation of the response. That question has not been investigated in this study. Our results do, however, highlight the importance of establishing laboratory norms of test–retest measures in visually-normal subjects and those with RGC dysfunction before evaluating changes in repeated measures.
In summary, while the PhNR has clinical potential in the early detection and monitoring of RGC disease, refinements to the technique of acquisition and processing of the amplitude are required to improve test–retest reliability and increase the confidence in making inferences about changes in RGC function.