**Purpose**:
Test–retest variability in perimetry consists of short-term and long-term components, both of which impede assessment of progression. By minimizing and quantifying the algorithm-dependent short-term variability, we can quantify the algorithm-independent long-term variability that reflects true fluctuations in sensitivity between visits. We do this at locations with sensitivity both < 28 dB (when the stimulus is smaller than Ricco's area and complete spatial summation can be assumed) and > 28 dB (when partial summation occurs).

**Methods**:
Frequency-of-seeing curves were measured at four locations of 35 participants with glaucoma. The standard deviation of cumulative Gaussian fits to those curves was modeled for a given sensitivity and used to simulate the expected short-term variability of a 30-presentation algorithm. A separate group of 137 participants was tested twice with that algorithm, 6 months apart. Long-term variance at different sensitivities was calculated as the LOESS fit of observed test–retest variance minus the LOESS fit of simulated short-term variance.

**Results**:
Below 28 dB, short-term variability increased approximately linearly with increasing loss. Long-term variability also increased with damage below this point, attaining a maximum standard deviation of 2.4 dB at sensitivity 21 dB, before decreasing due to the floor effect of the algorithm. Above 30 dB, the observed test–retest variance was slightly smaller than the simulated short-term variance.

**Conclusions**:
Long-term and short-term variability both increase with damage for perimetric stimuli smaller than Ricco's area. Above 28 dB, long-term variability constitutes a negligible proportion of test–retest variability.

**Translational Relevance**:
Fluctuations in true sensitivity increase in glaucoma, even after accounting for increased short-term variability. This long-term variability cannot be reduced by altering testing algorithms alone.

^{1}Many eyes progress slowly, whereas a few progress rapidly enough that the patient is at risk of visual impairment or blindness within their expected lifespan.

^{2}However, the crucial task of measuring this rate is hampered by the substantial test–retest variability of standard automated perimetry.

^{3}

^{,}

^{4}Thus, accurate rate measurements require either very frequent testing

^{5}(which is inconvenient for patients and resisted by payers) or long follow-up durations

^{6}(during which more disease progression toward blindness may have occurred). In order to reduce this variability and hence aid assessment of progression, it is essential to actually understand the sources of the variability. This understanding can guide efforts to optimize testing and also uncover upper bounds on the theoretically achievable repeatability for a perfect observer.

^{7}There are two main types of variability that affect visual field measurements: short term and long term.

^{8}

^{,}

^{9}Short-term variability is generally understood to represent the test–retest variability within a single session, which perimetric techniques and algorithms can seek to minimize. We hypothesize that there is also substantial long-term variability, which would remain even with a perfectly repeatable test and that its magnitude may be affected by disease severity.

^{10}

^{,}

^{11}The detection threshold in clinical perimetry is defined as the contrast at which the subject responds to 50% of stimulus presentations, and contrast sensitivity is the reciprocal of this contrast.

^{12}However, stimulus detection is dependent on the number of discrete neural spikes produced by retinal ganglion cells within a short window of time. The exact timing of spikes, and so the number within that window, varies even when identical visual stimuli are presented.

^{13}

^{–}

^{17}Hence, the frequency-of-seeing curve is not a step function but a gradual sigmoidal increase in response probability with contrast, which can be fit by, for example, a cumulative Gaussian distribution. We can then derive not only an accurate measurement of contrast sensitivity but also the standard deviation (SD) of this fitted cumulative Gaussian distribution, a measure of response variability.

^{18}This allows us to predict the short-term variability for any chosen testing algorithm, for a given “true” underlying sensitivity, by repeated simulated runs of the algorithm. Notably, this SD (and hence the short-term variability) increases markedly at locations with lower sensitivity.

^{10}

^{9}

^{,}

^{19}Some previous studies have defined long-term variability simply as the test–retest variability with a gap of weeks or months between the tests.

^{8}By our definitions, the total test–retest variance from patient data equals the sum of the short-term and long-term variances, which are independent of one another. There are various sources of long-term variability, such as time of year,

^{20}time of day,

^{21}technician experience,

^{21}and level of fatigue at the time of testing.

^{22}

^{,}

^{23}It has been shown that variability between fields taken annually is greater than variability between fields taken weekly,

^{24}suggesting the presence of longer term fluctuations in true function rather than just the effect of test reliability on a given day. That study also suggested an increase in long-term variability with glaucomatous damage, raising the possibility of underlying variations in true disease status (not just testing variability).

^{25}This means that partial spatial summation occurs; if stimulus area is doubled, the detection threshold (the reciprocal of contrast sensitivity) will decrease but by less than half. In glaucoma, as sensitivity decreases, Ricco's area enlarges.

^{26}When sensitivity is below around 28 dB, the size III stimulus has been found to be smaller than Ricco's area at most locations, and complete spatial summation occurs. As long as the stimulus area stays within Ricco's area, doubling the stimulus area will halve the detection threshold. It has been suggested that using stimuli smaller than Ricco's area will increase the signal-to-noise ratio of perimetry.

^{27}It has also been suggested that, when stimuli are larger than Ricco's area, variability increases much less than when stimuli are smaller than Ricco's area.

^{28}Therefore, it becomes important to quantify and to better understand the causes of variability both above and below this level of damage, with the realization that the relation between sensitivity and variability may not be homogeneous.

^{29}

^{,}

^{30}we determined the average standard deviation of a frequency-of-seeing curve for a given contrast sensitivity in patients with glaucoma. This builds on the work of Henson et al.

^{10}to include more severely damaged locations; extrapolating their model to <10 dB, beyond their measurements, produces unrealistically high predictions of variability.

^{31}We also extended the results to include testing with a larger size V (1.72° diameter) stimulus,

^{30}which has been reported to reduce variability.

^{32}In Experiment 2, we determined the test–retest variability of a custom-written, high-accuracy testing algorithm on a separate group of patients with suspected or confirmed glaucoma. This algorithm minimizes short-term variability by using 30 stimulus presentations per location, compared with around five per location in the Swedish Interactive Thresholding Algorithm (SITA) Standard algorithm,

^{33}and by using size V stimuli.

^{32}In Experiment 3, the short-term variability for our high-accuracy algorithm was predicted from simulations, using results from Experiment 1. By both minimizing the short-term variability and predicting the remaining short-term variability, we aimed to accurately characterize the long-term variability at different severities of glaucomatous damage. This can lead to improved understanding of the causes of that variability, in addition to informing the development of improved diagnostic testing.

^{29}

^{,}

^{30}In brief, 35 participants with moderate to severe primary open-angle glaucoma, as determined by their clinician, were recruited from the Devers Eye Institute glaucoma clinic. For eligibility, participants were required to have two or more non-adjacent visual field locations with sensitivities between 6 and 18 dB on both of their two most recent clinic visits (Humphrey Field Analyzer with 24-2 test pattern, size III stimulus, and SITA Standard algorithm; Carl Zeiss Meditec, Dublin, CA). Four test locations were chosen for testing, including at least two with significantly reduced sensitivity that remained ≥ 6 dB (i.e., not perimetrically blind), with the four locations spaced around the visual field in all four quadrants to promote stable fixation during testing. Frequency-of-seeing curves were measured at each location for both size III and size V stimuli using the method of constant stimuli

^{34}on an Octopus perimeter (Haag-Streit, Köniz, Switzerland) via the Open Perimetry Initiative interface.

^{35}For the size III stimulus, seven contrasts were selected for testing at 3-dB intervals centered at the perimetric sensitivity (the mean at that location over the last two clinical visual field tests). For the size V stimulus, the contrasts tested were 4 dB higher, because increasing the stimulus area is expected to increase sensitivity.

^{36}At the two most damaged locations of the four selected for a given eye, the highest contrast stimulus to be tested was always set to 3.7 dB, the greatest contrast presentable by the Octopus perimeter. For each stimulus size, 35 presentations were made per contrast level per location, split into five runs to reduce fatigue, with runs alternating between size III and size V stimuli. All protocols were approved and monitored by the Legacy Health Institutional Review Board and adhered to the Health Insurance Portability and Accountability Act of 1996 and the tenets of the Declaration of Helsinki. All participants provided written informed consent when all of the risks and benefits of participation had been explained to them.

*FP*+ (

*Max – FP*) × Φ[(

*Contrast*–

*Mid*)/

*SD*)]. Here,

*FP*represents the false-positive rate, as measured from 50 blank presentations interspersed within the runs; Φ represents a cumulative Gaussian distribution function, such that Φ(–∞) = 0, Φ(0) = 0.5, and Φ(∞) = 1; and

*Contrast*represents the stimulus contrast for that presentation. The remaining three parameters are fit by constrained maximum likelihood estimation.

*Mid*represents the midpoint of the curve and is constrained to be ≥–10 dB (to ensure algorithmic convergence);

*SD*represents the standard deviation of the curve, constrained to be ≥0 dB; and

*Max*represents the maximum response rate to an arbitrarily high contrast stimulus. Conventionally, this would be assumed to equal 100% minus the false-negative rate, but we have previously shown that this asymptotic maximum can be well below 100% at damaged locations.

^{29}

^{,}

^{30}From this equation, the perimetric contrast sensitivity was calculated using the conventional definition in clinical perimetry—namely, the contrast giving 50% response probability (this would exactly equal

*Mid*if and only if

*Max*= 100% –

*FP*). All analyses were performed using the statistical programming language R 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria).

^{10}reported that variability (defined, as here, as the standard deviation of a cumulative Gaussian fit to frequency-of-seeing data but assuming that the upper and lower asymptotes of the fit were 100% and 0%, respectively) for a size III stimulus increased exponentially with glaucomatous damage. They found a best fit model of the form log

*(SD) = −0.081 × sensitivity + 3.27, using locations with sensitivities between approximately 10 and 37 dB. Since that paper was published,*

_{e}^{10}it has become apparent that characteristics of perimetric sensitivity estimates may alter when sensitivity declines to approximately 28 dB, because Ricco's area expands in glaucoma.

^{26}At ∼28 dB, it can become larger than the size III stimulus,

^{37}so that the response characteristics are influenced by complete spatial summation rather than (until that point) incomplete summation.

^{27}

^{,}

^{38}

^{,}

^{39}It is not yet clear whether this affects variability. Thus, we repeated their exponential model fitting of the form log

*(SD) =*

_{e}*A*× sensitivity +

*B*, excluding locations for which the sensitivity was above 28 dB. We also excluded locations with sensitivity estimates below 3.7 dB (i.e., based on the curve being extrapolated beyond the highest contrast presented). For comparison, we also fit a linear model of the form SD =

*A*× sensitivity +

_{Lin}*B*. In each case, the regression lines were determined using Deming regression to account for measurement errors in both sensitivity and SD,

_{Lin}^{40}

^{,}

^{41}using the ratio between the squared measurement errors derived from the simulations described in the previous paragraph. In both cases, variability was compared between size III and size V, for matched sensitivity, within the same range of 3.7 to 28 dB, using a generalized estimating equation (GEE) model to account for intra-eye correlations.

^{42}

^{43}

^{,}

^{44}which tests just four visual field locations, with 30 presentations per location. The advantages of this over using clinical perimetry data when investigating long-term variability are that the short-term variability is as low as can reasonably be achieved within a short test duration (around 5 minutes per eye, similar to clinical perimetry, to avoid excessive fatigue effects) and is predictable using the results from Experiment 1. A size V stimulus was used for this experiment to further decrease the short-term variability.

^{32}

^{45}

^{,}

^{46}For inclusion, they were required to have a diagnosis of glaucoma or suspected glaucoma in at least one eye, as determined by their clinician. Eyes with non-glaucomatous visual field loss were excluded. Each participant was tested twice, with a six-month interval between tests (or as close as their visit could be scheduled). One eye was tested per participant. In order to obtain a greater spread of sensitivity estimates, the most damaged eye was chosen for testing, except if two or more of the four chosen locations had sensitivity < 0 dB on their most recent visual field test, in which case the better eye was tested. Eyes that underwent any ocular surgery during that interval were excluded.

^{39}; with one per quadrant to increase spatial uncertainty.

^{47}Testing was performed on an Octopus perimeter as in Experiment 1, with background intensity of 10 cd/m

^{2}and stimulus duration 200 ms, to match clinical perimetry as closely as possible. At each location, the Bayesian ZEST algorithm started by assuming a flat prior for sensitivity within the range of 10 to 45 dB. The lower end of this range was chosen because we have previously shown that sensitivities below 15 to 19 dB are unreliable, with variability obscuring any actual signal.

^{29}The prior distribution for sensitivities extended beyond the range of plausible true sensitivities in both directions to aid algorithmic convergence. After each presentation, the posterior PDF was calculated by multiplying the prior by either a cumulative Gaussian (if the participant responded to the stimulus) or its inverse (if not), with SD according to the equation of Henson et al.

^{10}The next stimulus contrast to be presented was chosen as the mean of the current prior distribution.

^{43}For each location, the sensitivity estimates after 10, 20, and 30 presentations were recorded.

*Sens*

_{1}–

*Sens*

_{2})/2)], was plotted against the estimated sensitivity, (

*Sens*

_{1}+

*Sens*

_{2})/2. Locally estimated scatterplot smoothing (LOESS) curve fitting

^{48}was used to predict the expected squared error for a given sensitivity, which equals the test–retest variance that would be predicted for a new participant with that sensitivity; the predicted SD (the square root of this variance) was added to the plot. Note that these LOESS fits were performed using variances, because variances of independent components of variability add linearly; however, results are presented as SDs (the square root of the variances) for easier visualization and interpretation.

*Sens*), a predicted frequency-of-seeing curve was generated using one of three models to derive its SD:

- • Exponential model—log
(SD) =_{e}*A*× sensitivity +*B*, using the best fit exponential model for sensitivities between 3.7 and 28 dB for a size V stimulus from Experiment 1. - • Henson model—log
(SD) = –0.081 × sensitivity + 3.27, using the results of Henson et al._{e}^{10} - • Linear model—SD =
*A*× sensitivity +_{Lin}*B*, using the best fit linear model for sensitivities between 3.7 and 28 dB from Experiment 1. This model additionally assumed that the SD reached a floor when sensitivity > 28 dB (i.e., when the stimulus may be smaller than Ricco's area, causing partial spatial summation_{Lin}^{37}); above this point SD was kept constant.

*Sens*

_{1}and

*Sens*

_{2}, then this represents two “measurements” with an estimated sensitivity of (

*Sens*

_{1}+

*Sens*

_{2})/2 and an estimated squared error of [(

*Sens*

_{1}–

*Sens*

_{2})/2]2. The absolute difference between the two sensitivity estimates after 10, 20, and 30 presentations can then be compared against the observed data from Experiment 2 for the same estimated sensitivity.

^{48}based on the squared errors was fit to the data. This LOESS fit provides the predicted mean squared error (i.e., the predicted test–retest variance from short-term variability alone) for the chosen model at any given sensitivity. The variance of the long-term variability was then calculated as the observed test–retest variance (from Experiment 2) minus this predicted short-term variance.

*SD*) against sensitivity (the contrast at which the participants would be predicted to respond to 50% of stimulus presentations based on the fitted frequency-of-seeing curve) for all tested locations for size III stimuli (Fig. 1A) and size V stimuli (Fig. 1B). The mean sensitivity of the tested locations was 18.6 dB for size III stimuli, and it was 25.4 dB for size V stimuli. The mean SDs were 3.6 dB for size III stimuli and 2.5 dB for size V stimuli. The expected measurement errors for size V stimuli were ±1.05 dB for sensitivity, ±1.24 dB for SD, and ±0.24 dB for log(SD). The solid red curves and the blue lines show the exponential and linear models, respectively, for locations with sensitivity ≥ 3.7 dB (so that sensitivity is not based on extrapolation beyond the range of contrasts tested) and ≤ 28 dB (so that the stimulus is smaller than Ricco's area

^{37}), using Deming regression based on the ratio of expected measurement errors. These lines were given by

- Size III: log
(SD) = 2.576 – 0.068 × sensitivity_{e} - SD = 9.658 – 0.287 × sensitivity
- Size V: log
(SD) = 2.578 – 0.060 × sensitivity_{e} - SD = 11.316 – 0.329 × sensitivity

*P*< 0.001 from GEE regression) and lower variability (SD = 1.03 dB lower;

*P*= 0.002). However, for the same sensitivity, variability was slightly higher for the size V stimulus, whether using the linear model (SD = 0.75 dB higher;

*P*= 0.011) or the exponential model, where log(SD) = 0.178 higher (

*P*= 0.027). This difference did not depend on sensitivity (

*P*= 0.810 for the linear model,

*P*= 0.247 for the exponential model). The similarities between the fits for the two stimulus sizes in Figure 1, in particular the fact that the slopes do not significantly differ, are an indication that estimates of long-term variability derived by subtracting simulated short-term variability from observed test–retest variability should be the same for both sizes, as long as the simulation uses the appropriate model for that stimulus size. Thus, testing for Experiment 2 was conducted using the size V stimulus in order to further reduce short-term variability.

^{10}indicated by the orange curve, was based on locations with sensitivity ≥ 10 dB, and this higher floor caused it to overestimate variability at lower sensitivities (red dashed line) by even more. Notably, when locations with sensitivities > 28 dB were included in the fit (i.e., including locations at which complete spatial summation does not occur), the exponential model actually performed better than the linear model (root mean square error = 1.43 dB vs. 1.62 dB when using the size V stimulus). Thus, the recent and ongoing work examining the role of Ricco's area in perimetric sensitivities

^{26}

^{,}

^{27}

^{,}

^{37}

^{–}

^{39}may lead to re-evaluation of the optimal model of the sensitivity–variability relation.

^{33}was −1.79 dB (median, –0.92 dB; range, −19.85 to +2.87). For the four locations tested, the average SITA Standard sensitivity on the date of the second visit was 27.7 dB (range, <0 to +34), and 88 locations (17%) had sensitivity ≤25 dB. Using the custom-written, low-variability algorithm, the estimated sensitivities after 30 presentations per location were on average 0.70 dB higher (

*P*= 0.046, GEE regression).

*Sens*

_{1}–

*Sens*

_{2})/2, plotted against estimated sensitivity, (

*Sens*

_{1}+

*Sens*

_{2})/2, after 10, 20, and 30 presentations. As expected, this test–retest variability increased with damage until sensitivity began to approach the measurement floor. The black line shows the LOESS fit to the data (noting that this LOESS fit is based on the squared errors, as explained above, so it is the square root of the predicted variance that is plotted as the black line). The apparent increase in variability at near-normal sensitivities when using 10 presentations per location (Fig. 2A) is due to the ZEST algorithm starting with a flat prior within the range of 10 dB to 45 dB, such that the first presentation was always 22.5 dB, and it took several presentations for the estimate to converge to values near 30 dB. The apparent increase in variability for sensitivities > 32 dB is due to very few locations having true sensitivities this high; hence, if the estimated sensitivity is >32 dB, then it is likely due to one of the two values,

*Sens*

_{1}or

*Sens*

_{2}, being particularly noisy.

*y*-axis in Figure 4 shows the SD of the long-term variability (i.e., square root of the calculated variance) rather than the variance itself. The estimates shown in Figure 4 are based on 30 presentations per location using the LOESS plots shown in Figures 2C and 3C. The estimates based on 20 presentations per location were almost identical. For the linear model, the highest long-term variability estimate based on 30 presentations was 2.39 dB when the sensitivity was 20.7 dB; based on 20 presentations, the maximum was 2.36 dB at a sensitivity of 20.3 dB. Estimates based on 10 presentations were also very similar, but noisier. The estimates of long-term variability were also almost identical between the three models used for the simulations, further supporting the robustness of the results. As seen in Figure 3, the estimated observed test–retest variability (as fit by the LOESS curve) was actually smaller than the simulated short-term variability for sensitivities > 30 dB, an indication that long-term variability constitutes a negligible proportion of the test–retest variability at such locations.

^{20}diurnal cycles, and technician experience

^{21}would seem to be independent of disease status; yet, we have demonstrated here that this long-term variability increases markedly in regions of glaucomatous loss. For estimated sensitivities of 30 to 32 dB, the observed test–retest SD was actually slightly smaller than the SD of the simulated short-term variability, as can be seen in Figure 3. The simulated variability may have been overestimated at these sensitivities, in particular due to the choice to assume constant variability in the linear model above 28 dB (see Fig. 1). However, we can still infer that, for locations with near-normal sensitivity, long-term variability can be considered to be negligible and test–retest variability is driven almost entirely by the short-term variability caused by the probabilistic nature of the frequency-of-seeing curve. At locations with glaucomatous damage, long-term variability increased. It reached a peak at sensitivities around 20 dB, below which the floor effect of the testing algorithm likely dominated and obscured any further increase.

^{27}

^{,}

^{37}

^{–}

^{39}Similarly, a small movement in fixation will alter the number of retinal ganglion cells that are stimulated, and this will have a greater effect on sensitivity when complete spatial summation occurs, compared to healthy locations where only partial spatial summation occurs. Compounding this effect is that the number of cells stimulated may vary more in damaged regions than healthy regions for the same magnitude of fixation shift due to the inhomogeneous nature of glaucomatous loss. It is possible that factors such as fatigue also have a greater influence on sensitivity in a system that is already stressed by ongoing disease processes.

^{22}

^{,}

^{23}Additionally, pathophysiologic effects such as reduced axonal transport

^{49}and altered autoregulation of blood flow

^{50}

^{,}

^{51}could cause individual retinal ganglion cells to only function intermittently, causing variations in the true sensitivity at those locations. Transient scotomas have been demonstrated in healthy subjects after IOP elevation,

^{52}and glaucomatous eyes may be more susceptible to such changes.

^{53}

^{7}It has been reported that the root mean square test–retest error using the SITA Standard algorithm at a location with sensitivity 20 dB is approximately 4 dB,

^{54}meaning that a 0.8-dB reduction in variability would be required. Yet, the long-term variability constitutes 2.39 dB of this total, as suggested by our results above (Fig. 4), and that portion cannot be reduced by altering the testing algorithm. The remaining 1.61 dB is short-term variability that does depend on the testing algorithm. Thus, a requirement of a 0.8-dB reduction in variability would necessitate a 50% reduction of short-term variability from algorithm-dependent sources.

^{24}Possible approaches could try to shorten the test to reduce fatigue; indeed, this may be one reason why the observed increase in test–retest variability from using the SITA Fast testing algorithm instead of SITA Standard is not as large as might have been predicted.

^{55}Another approach could be to alter the test to make it more engaging and less stressful for the test subject, again with the aim of reducing the effect of fatigue.

^{37}and might increase the detectability of defects, especially centrally.

^{38}However, as seen in Experiment 1, reducing stimulus size increases the standard deviation of the frequency-of-seeing curve and hence increases the range, of contrasts over which the test subject is unsure whether or not they saw the stimulus, making the test more mentally taxing. There is an inherent conflict between the desire to use test stimuli smaller than Ricco's area to increase the short-term signal-to-noise ratio

^{56}versus the likelihood that this may make the test more fatiguing for the patient and hence increase long-term variability.

^{10}used in some simulation studies

^{57}

^{–}

^{59}may be suboptimal. It has previously been suggested that the Henson model overestimates variability at low sensitivities and that, when including such locations, an exponential model with different coefficients giving lower estimates could be more accurate

^{60}; this is consistent with the results of fitting the exponential model to our data from Experiment 1. It has also been suggested that the Henson model could be used, but a maximum standard deviation of 6 dB should be imposed to better represent empirical estimates of variability at low sensitivities.

^{31}However those studies and the original study by Henson et al.

^{10}all assumed that the relation between sensitivity and variability should be consistent across the range, of sensitivities. Yet, recent studies have shown that properties of sensitivity estimates from perimetry differ between locations where the stimulus area is within versus outside Ricco's area.

^{26}

^{,}

^{27}

^{,}

^{37}

^{–}

^{39}We propose that the same may be true of the relation between sensitivity and variability. Data from this study and from previous studies

^{31}

^{,}

^{61}are actually more consistent with a linear model. Such a model would assume that, when sensitivity is below around 28 dB, variability increases linearly as sensitivity decreases. Such a model gives a very similar fit to the exponential model from 10 to 28 dB but without the excessively high estimates at lower sensitivities. The apparently better fit of the exponential model is solely due to the influence of locations with sensitivities above this cutoff. The exact upper cutoff that is optimal and the form of the model that could be used above that sensitivity cannot be determined from our data. The linear model used in the simulations of Experiment 3 assumed constant variability at locations above 28 dB, but that assumption was made for simplicity in the absence of better information. The exact sensitivity at which complete spatial summation ends will inevitably vary between individuals, and 28 dB is an approximation based on previous studies

^{37}

^{,}

^{39}rather than a definitively optimized value. Above whatever cutoff is chosen, it is certainly plausible that variability could still be related to sensitivity among locations undergoing partial spatial summation, just with a shallower slope.

^{56}It should be noted that, when results were based on 30 stimulus presentations per location, the model used did not appreciably impact the estimated short-term variability of the testing algorithm or the consequent estimate of the long-term variability, as seen in Figures 3C and 4. For the purposes of this study, an inaccurate model delays but does not prevent convergence to the sensitivity estimate. Thus, although this issue is important for future studies and indeed our results from Experiment 1 can contribute to those studies, it does not affect the validity of our conclusions concerning long-term variability.

^{27}Similarly, although it has been shown that Ricco's area increases in glaucoma,

^{26}the sensitivity to a stimulus of area exactly equaling Ricco's area should remain the same. In our linear model from Experiment 1, the transition from constant short-term variability above the cutoff to variability increasing linearly with loss below the cutoff occurs at the same cutoff of 28 dB for both size III and size V stimuli, even though increasing stimulus size means that more locations now have sensitivity greater than this value.

^{30}Interestingly, a very similar relation between variability and sensitivity for size III and V stimuli, with hints of a possible change at around 28 dB, has been found for patients with Leber hereditary optic neuropathy,

^{62}another condition in which scotomas result from retinal ganglion cell loss.

^{24}but we are not able to test that idea with the current dataset. The study by Urata et al.

^{24}found that the variability among five weekly tests was lower than the variability among five annual tests. Over longer time periods, the rate of change will often not be constant due to, for example, treatment changes, so the SD of residuals from a linear trend may overestimate the underlying long-term variability. It therefore remains unclear whether long-term variability would increase continually with the intertest interval or plateau when tests are more than some interval apart.

^{63}

^{,}

^{64}

**S.K. Gardiner**, None;

**W.H. Swanson**, None;

**S.L. Mansberger**, None

*Am J Ophthalmol*. 2008; 145(2): 191–202. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(1): 102–109. [CrossRef] [PubMed]

*Am J Ophthalmol*. 1989; 108(2): 130–135. [CrossRef] [PubMed]

*Am J Ophthalmol*. 1990; 109(1): 109–111. [CrossRef] [PubMed]

*Ophthalmology*. 2013; 120(1): 68–76. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(1): 224–227. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2011; 52(6): 3237–3245. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2000; 129(3): 309–313. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41(11): 3429–3436. [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41(2): 417–421. [PubMed]

*Invest Ophthalmol Vis Sci*. 2001; 42(6): 1404–1410. [PubMed]

*Automated Static Perimetry*. 2nd ed. St. Louis, MO: Mosby; 1999: 147–159.

*Science*. 1997; 275(5307): 1805–1808. [CrossRef] [PubMed]

*Science*. 1999; 283(5409): 1927–1930. [CrossRef] [PubMed]

*J Neurophysiol*. 1997; 77(5): 2836–2841. [CrossRef] [PubMed]

*Vision Res*. 2008; 48(18): 1859–1869. [CrossRef] [PubMed]

*Percept Psychophys*. 2001; 63(8): 1421–1455. [CrossRef] [PubMed]

*Percept Psychophys*. 2001; 63(8): 1348–1355. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1984; 102(5): 704–706. [CrossRef] [PubMed]

*Ophthalmology*. 2013; 120(4): 724–730. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(11): 7010–7017. [CrossRef] [PubMed]

*Appl Opt*. 1988; 27(6): 1030–1037. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 1994; 35(1): 268–280. [PubMed]

*Am J Ophthalmol*. 2020; 210: 19–25. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2015; 56(6): 3565–3576. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2010; 51(12): 6540–6548. [CrossRef] [PubMed]

*Sci Rep*. 2018; 8: 2172. [CrossRef] [PubMed]

*Optom Vis Sci*. 2006; 83(7): 499–511. [CrossRef] [PubMed]

*Ophthalmology*. 2014; 121(7): 1359–1369. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2015; 4(2): 10. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2021; 10(1): 18. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 1997; 38(2): 426–435. [PubMed]

*Acta Ophthalmol Scand*. 1997; 75(4): 368–375. [CrossRef] [PubMed]

*Psychol Res*. 1992; 54(4): 233–239. [CrossRef] [PubMed]

*J Vis*. 2012; 12(11): 22. [CrossRef] [PubMed]

*Ophthalmology*. 1990; 97(3): 371–374. [CrossRef] [PubMed]

*Ophthalmic Physiol Opt*. 2017; 37(2): 160–176. [CrossRef] [PubMed]

*PLoS One*. 2016; 11(7): e0158263. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2018; 59(8): 3667–3674. [CrossRef] [PubMed]

*Statistical Adjustment of Data*. New York: Wiley; 1943.

*Invest Ophthalmol Vis Sci*. 2013; 54(6): 4189–4196. [CrossRef] [PubMed]

*Biometrika*. 1986; 73: 13–22. [CrossRef]

*Vision Res*. 1994; 34(7): 885–912. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2003; 44(11): 4787. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(7): 3598–3604. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2017; 58(6): BIO180–BIO90. [CrossRef] [PubMed]

*J Opt Soc Am A*. 1985; 2(9): 1508–1532. [CrossRef] [PubMed]

*J Am Stat Assoc*. 1979; 74(368): 829–836. [CrossRef]

*Curr Eye Res*. 2016; 41(3): 273–283. [PubMed]

*Prog Retin Eye Res*. 2008; 27(3): 284–330. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(6): 3509–3516. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1962; 68: 478–485. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 1967; 6(2): 103–108.

*Invest Ophthalmol Vis Sci*. 2002; 43(8): 2654–269. [PubMed]

*JAMA Ophthalmol*. 2015; 133(1): 74–80. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2017; 58(8): 2852. [PubMed]

*Invest Ophthalmol Vis Sci*. 2002; 43(5): 1400–1407. [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(5): 3265–3274. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2013; 2(4): 3. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(10): 5985–5990. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 1993; 34(13): 3534–3540. [PubMed]

*Transl Vis Sci Technol*. 2021; 10(12): 31. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1996; 114(1): 19–22. [CrossRef] [PubMed]

*Optom Vis Sci*. 2008; 85(11): 1043–1048. [CrossRef] [PubMed]