At each location, a small number of stimulus presentations must be made to allow the decision of “normal” or “damaged” to be derived. Further, the test must be suitable for participants who are new to perimetry, and who may only receive brief instructions (for example in rapid community screening programs). Suprathreshold targets are used for screening purposes, and their ease of identification in normal regions of visual field assists in patient instruction and compliance with the test. However, a key decision to be made is the suprathreshold level (in this case, the luminance of the Size III target).
The number of chances that an observer is given to respond to a stimulus affects the sensitivity–specificity trade-off of the test at that location. For example, consider a single location for a particular observer. If the stimuli level is set at a suprathreshold level that the observer can see 95% of the time and the stimulus was presented only once we would expect a specificity of 95% from that location (assuming no false-responses). If the stimulus was presented a second time, if not seen the first, then the chance of our observer not seeing both presentations is 0.25%, and specificity rises to 99.75%. If the observer has visual field damage at this location (that results in their true sensitivity being poorer than the 95% normal level), they will only respond “seen” as a false-positive. If there is a 15% chance of a false-positive response, then showing the stimuli once will give a sensitivity of 85%, and showing it twice gives the observer more chance of falsely pressing the button, and so sensitivity drops to 72.25%. Thus, presenting the stimuli more than once per location increases specificity, but decreases sensitivity (assuming false-positive responses). As a variant, we could present the stimulus 3 times, and require the observer to see it twice to be “normal.” Again, this trades increased sensitivity against decreased specificity. Artes et al.
7 give a principled approach to examine these trade-offs, and for selecting a multisampling scheme. As we were aiming for high specificity in a screening procedure, we initially computed the expected specificity for the following four multisampling suprathreshold schemes from their selections at a single location: (1) 1-of-1, where 1 presentation must be seen for a “normal” classification; (2) 1-of-2, where 1 of 2 presentations must be seen for a “normal” classification; (3) 1-of-3, where 1 of 3 presentations must be seen for a “normal” classification; and (4) 2-of-3, where 2 of 3 presentations must be seen for a “normal” classification.
Note that some screening procedures have altered the stimulus level after the first presentation,
8 but we did not explore these variants. Rather, we kept the stimulus level constant at any given location set to either the population's 0.5%, 1%, 2%, or 5% sensitivity levels (brightest) of age-matched normal observers for that location. These values were taken from the normative database of the Octopus 600 as supplied by Haag-Streit AG. By keeping the stimuli level constant for all presentations at a location, we can compute the expected specificity and expected number of presentations at a single location, and be confident that it will apply to all locations in the visual field. If we used an adaptive level of stimuli, spatial logic between neighbors, or did not adjust stimuli levels for age and eccentricity, then we could not make such an extrapolation.
The combination of the four sampling methods and four stimulus intensities gives 16 different procedures to compare. For the moment, let us assume that we know the probability that a normal observer will say “yes” to a stimulus of
s dB, say
p. We can compute from this the expected specificity and number of presentations as in
Table 1.
For perimetric stimuli, the probability of responding “seen” to stimulus at a level
s,
p, is usually modelled as a cumulative Gaussian frequency-of-seeing curve
9–12 as follows:
where
f+ and
f− are the probabilities of a false-positive and false-negative response respectively,
Gauss is a cumulative Gaussian distribution with mean
μ and standard deviation
σ,
t is the known threshold for the location, and the value of
σ is taken from
Table 1 of Henson et al.
9 altered for the Octopus dB scale. The Henson et al.
9 model was built assuming that 0 dB is 10,000 apostilbs, whereas the Octopus dB scale assumes a 4000 apostilb light for 0dB. In our simulations we do not allow
σ to exceed 6 dB.
So now, to compute the expected specificity for a scheme over a population of normal observers we can take a normative database to give us
Q(
t), the probability of a particular threshold occurring in the population at this location, and sum the product of that probability with the value from
Table 1. For example, for the 1-of-2 scheme we would compute:
where
s,
f+, and
f− would be constants chosen for the simulation. In our case,
s can be one of the 0.5%, 1%, 2%, or 5% levels taken from the normative database;
f+ will be zero, as any false-positives will only increase specificity calculations; and
f− will vary from 0% to 50% in steps of 5%. Using this approach we computed all of the expected specificities and number of presentations of the 16 schemes for a variety of false-negative conditions. This will allow us to choose the scheme with the best specificity-time trade-off for an individual location. The sensitivity of the entire test is explored in Experiment 3.