**Purpose: **:
The cumulative sum (CUSUM) is proposed and tested in a group of glaucoma patients and healthy subjects as a method for monitoring disease progression and for identifying clinically significant step changes in visual structure or function.

**Methods: **:
The CUSUM procedure is the recommended method for the timely detection of small step changes in manufacturing process control. The CUSUM procedure is discussed and compared with traditional approaches for the detection of change in the status of the visual system over time. The CUSUM approach is used to monitor over time visual field (VF) mean deviations and optical coherence tomography (OCT) measurements of retinal nerve fiber layer (RNFL) thickness in 53 healthy subjects and 103 patients with glaucoma.

**Results: **:
The CUSUM method detects VF progression for 35 of the 103 glaucoma patients (34.0%), and OCT RNFL reductions for 20 of the 103 glaucoma patients (19.4%).

**Conclusions: **:
The CUSUM method is effective in detecting small level changes. This method can be used to monitor the progression of disease and it benefits the clinician who must decide, on the basis of a time series of variable data, whether a change has occurred.

**Translational Relevance: **:
A cumulative sum chart helps the clinician decide whether a step change has taken place, and it does so as quickly as possible. This approach is particularly effective for detecting small step changes, which very likely are unnoticed with currently used change detection approaches.

*n*,

*Y*, is consistent with a certain in-control value

_{n}*μ*

_{0}. A warning flag gets raised if the difference between the measurement and the in-control value exceeds a critical threshold that is determined from data on healthy (in-control) patients. It is common to compare the measurement with a reference distribution that comes from healthy subjects, express the measurement as a percentile of that distribution, and raise a flag signaling a reduction if the measurement

*Y*represents a percentile of order 100

_{n}*α*or smaller (where

*α*is a value such as 0.05, 0.025, or 0.01). For normal distributions with known in-control mean

*μ*

_{0}and SD

*σ*, this approach compares the observation

*Y*with

_{n}*μ*

_{0}+

*z*;

_{α}σ*z*is the percentile of the standard normal distribution such as

_{α}*z*

_{0.05}= −1.65,

*z*

_{0.025}= −1.96, and

*z*

_{0.01}= −2.33.

^{ 1 }Deming,

^{ 2 }Ledolter and Burrill

^{ 3 }). A flag gets raised whenever an observation exceeds the in-control value by more than a multiple of the observations' SD. For a multiple of three, three sigma control limits correspond to percentiles of order 0.135% and 99.865%, assuming a normal distribution. A Shewhart chart that monitors the process for a reduction is equivalent to raising a flag whenever the measurement

*Y*represents a very small (0.135%) in-control percentile.

_{n}*z*, or the percentage point α, affects the properties of the procedure. It is common to characterize a Shewhart chart by its implied average run length (ARL). The run length is the number of observations it takes to conclude that a reduction has occurred, and the ARL is its expected value. One wants the ARL large if there has been no change, and one wants it small if the process has changed to a new lower level. The in-control ARL of the Shewhart chart is

_{α}*ARL*(

*μ*

_{0}) = 1/

*α*; see Ledolter and Burrill.

^{ 3 }For example, the Shewhart chart with

*α*= 0.025 and

*z*

_{0.025}= −1.96 implies

*ARL*(

*μ*

_{0}) = 40; the chart with

*α*= 0.01and

*z*

_{0.01}= −2.33 implies

*ARL*(

*μ*

_{0}) = 100.

*n*observations, For given

*n*and 5% error of false positives, one can solve the equation 0.05 = 1 – (1 −

*α*)

*for*

^{n}*α*= 1 – 0.95

^{1/n }and

*z*. For the data that we use in the Results section of this paper (53 healthy and 103 glaucoma patients in a University of Iowa/Veterans Administration study) the average number of monitoring periods is

_{α}*n*= 7, sampled about every 6 months. With

*n*= 7,

*α*= 1 – 0.95

^{1/7}≈ 0.01, and

*z*

_{0.01}= −2.33. With cutoff −2.33, groups of seven consecutive observations are falsely rejected 5% of the time.

*z*responds to a reduction in the level to

_{α}*μ*

_{1}=

*μ*

_{0}−

*rσ*(here change is defined as a multiple,

*r*, of the process SD). The probability of obtaining an out-of-control signal is

*P*[

*Y*<

*μ*

_{0}+

*z*] =

_{α}σ*P*[

*Z*<

*z*+

_{α}*r*], where the probability for the standard normal random variable

*Z*can be looked up in statistical tables. The out-of-control ARL is

*ARL*(

*μ*

_{1}) = 1/

*P*[

*Z*<

*z*+

_{α}*r*] and the probability that the Shewhart chart signals a change within the next

*n*observations is

*z*

_{0.01}= −2.33 for each of three consecutive observations makes a signal unlikely and leads to a negligible probability of false positives among

*n*= 7 consecutive observations and weak power of detecting an actual change. The cutoff needs to be selected smaller in absolute value in order to achieve the targeted 5% false positive rate. Simulations reported in the Results section show that for such modified charts

*α*= 0.25 and

*z*

_{0.25}= −0.68 lead to a 5% false positive rate among seven consecutive observations.

^{ 3 }). Hence, our terminology referring to such charts as “modified” Shewhart charts.

^{ 4 }and is the recommended method for the timely detection of small step changes. Optimal theoretical results for the CUSUM procedure were shown by Moustakides

^{ 5 }: Among all procedures with the same in-control ARL, the CUSUM minimizes the expected time until a change gets signaled once the process has shifted to the out-of-control state. One would like the ARL large if the process is in-control (that is, the subject stays free of disease or if there is no progression of disease), but wants it to be small if a shift to the out-of-control state (i.e., onset or progression of disease) has occurred. We describe the CUSUM chart in this section, and illustrate it with examples in the Results section.

- The SD of the repeat measurement variability,
*σ*. - The in-control value
*μ*_{0}and the magnitude of the step change one wants to detect. The in-control value*μ*_{0}comes from either the population average of healthy (in-control) patients or from subject-specific baseline information (such as the mean deviation and the mean RNFL thickness that is obtained from the first few observations on each subject). A shift of one SD of the repeat measurement variability is taken to represent the magnitude of a clinically relevant step change, but smaller shifts can be studied if they are thought to be more relevant. The adopted step change from the in-control value determines the out-of-control value*μ*_{1}. - The ARL under the in-control situation. For example, ARL = 100 implies that a false positive signal (signaling a change when no change is present) occurs on average after 100 consecutive observations. For patients visiting the clinic every 6 to 12 months, a procedure constructed with ARL = 100 allows for few false positive signals during a patient's follow-up period of reasonable length. We use ARL = 100 in our examples. Recall that the Shewhart chart with
*α*= 0.01 and*z*_{0.01}= −2.33 attains ARL = 100.

*μ*

_{0}to an out-of-control value

*μ*

_{1}less than

*μ*

_{0}. With consecutive observations

*Y*

_{1},

*Y*

_{2}, … ,

*Y*, we compute signals

_{n}*S*

_{1},

*S*

_{2}, … ,

*S*according to the CUSUM recursion, with starting value

_{n}*S*

_{0}= 0. The constant

*k*= (

*μ*

_{1}–

*μ*

_{0})/2 less than 0 is one-half of the difference between the out-of-control and in-control values, amounting to one-half of the decrease we want to detect. We conclude that a change has occurred when the signal

*S*is smaller than a certain critical value

_{t}*h*less than 0. Computer software is available to determine the critical value such that the CUSUM procedure achieves the desired in-control ARL. Average run lengths for specified alternatives can be calculated, assessing how long it takes on average to detect a change of a certain magnitude. Brook and Evans

^{ 6 }use a Markov chain approach to derive the ARLs for given critical value

*h*; a detailed discussion on how to do this is given in Hawkins and Olwell.

^{ 7 }This book and the webpage of Douglas Hawkins

^{ 8 }at the University of Minnesota provide useful and easy to use computer software.

*σ*and construct a CUSUM that monitors measurements for a reduction of 1 SD from in-control value

*μ*

_{0}to out-of-control value

*μ*

_{1}=

*μ*

_{0}−

*σ*. For illustration, we use

*σ*= 1,

*μ*

_{0}= 0, and

*μ*

_{1}=

*μ*

_{0}–

*σ*= −1. CUSUM signals are calculated from Equation 3, with

*k*= (−1 − 0)/2 = −0.5. For in-control ARL of 100, the critical value is

*h*= −2.850 and the ARL until detecting a shift to the out-of-control value (

*μ*

_{1}= −1) is 6.1. On average, the CUSUM detects a step change reduction of 1 SD six periods after the change has taken place (which amounts to 3 years in a typical case of a glaucoma subject being followed every 6 months). The critical value

*h*= −2.850 and the out-of-control ARL 6.1 are obtained with statistical software; for example, with the program geth.exe from the webpage of Douglas Hawkins

^{ 8 }at the University of Minnesota, http://users.stat.umn.edu/~dhawkins/ (the program is located under Software and Cumulative Sums). What if we wanted to detect a smaller change of half of an SD? Then

*k*= (−0.5 − 0)/2 = −0.25 and

*h*= −4.418, and the ARL at the out-of-control value (

*μ*

_{1}= −0.5) is 14.8. The smaller shift is more difficult to detect. On average, we detect a step change of half of an SD within 15 periods after the change has taken place.

*μ*

_{0}= 0 as deviations from the baseline should have mean zero if the process is in control. We consider

*σ*= 1 and

*μ*

_{1}=

*μ*

_{0}–

*σ*= −1, even though any other value of

*σ*could have been used without affecting the ARL at the out-of-control value

*μ*

_{1}=

*μ*

_{0}−

*σ*and the time when the CUSUM exceeds the critical value. The only quantities that change with

*σ*≠ 1 are the reference value

*k*(it changes from −1/2 to –

*σ*/2) and the critical value (it changes from −

*h*to −

*hσ*).

*z*

_{0.001}= −2.33, giving a 6.9% false positive rate of detecting a change when none exists (“No Change,” row 1). For the modified Shewhart chart that concludes a reduction if three consecutive signals are below the threshold, the appropriate standardized threshold is

*z*

_{0.25}= −0.68, giving a 6.2% false positive rate of detecting a change when none exists. A comparison of the detection probabilities when the process mean has changed shows that the modified Shewhart chart detects a reduction from baseline more often than the Shewhart chart (58.7% vs. 49.7% detection of a 1-sigma reduction; row 3 of Table 1), while controlling the false positive rate at about the same level (6%).

**Table 1**.

*z*

_{0.01}= −2.33 and ARL = 100, and the modified Shewhart chart that requires three consecutive signals with threshold

*z*

_{0.25}= −0.68. The three procedures have similar in-control properties (same in-control ARL = 100 for CUSUM and Shewhart charts, and similar proportions of false positives for a window of seven observations). Table 2 confirms that the CUSUM outperforms the other two procedures. The CUSUM has smaller out-of-control run lengths and higher detection proportions than either of the other two procedures.

**Table 2**.

**Figure 1**.

**Figure 1**.

**Table 3**.

*σ*. A 5-week repeatability study (5 determinations, once a week) with 34 glaucoma and 22 healthy patients at the University of Iowa found that the SDs among average thickness measurements of the RNFL (Stratus OCT3; Zeiss Meditec, Dublin, CA) are similar for glaucoma (

*σ*= 3.01 μm) and healthy subjects (

*σ*= 2.99 μm). The (decibel) SD of the mean deviation of the VF measurements (Humphrey SITA 24-2) for glaucoma patients,

*σ*= 0.98 dB, was found to be larger than the SD for healthy patients,

*σ*= 0.55dB. These are population averages. While we know that there is substantial variability among

*σ*from one subject to another, we do not have enough data to estimate a subject-specific

*σ*at the time a person enters the clinic. The initial two measurements on a subject taken in brief succession give us a rough estimate for his/her baseline, but any estimate of a SD from just two observations is subject to sizeable sampling variability.

*σ*= 1 dB for monitoring the progression of the two glaucoma patients. We wish to learn whether the subsequent measurements indicate a reduction from the subject-specific baselines. The calculated CUSUM statistics, using Equation 3 with

*k*= −

*σ*/2 = −0.5, are given in Table 3. For in-control average run length 100, the critical cutoff is

*h*= −2.85

*σ*= −2.85. For patient 17, a reduction is signaled at time-period 4. For patient 18, the evidence for a reduction is insufficient, even though the CUSUM signal at time-period 7,

*S*

_{7}= −2.675, is very close to the threshold

*h*= −2.85. Continued reductions that extend the pattern established over the last periods would force the CUSUM below the threshold providing evidence for a reduction.

*μ*

_{0}= 0. In the absence of subject-specific SDs, we rely on the population averages that we obtained from the Iowa data. That is,

*σ*= 3 μm for OCT RNFL for both healthy and glaucoma patients,

*σ*= 1 dB for VF mean deviations of glaucoma patients, and

*σ*= 0.55 dB for VIF mean deviations of healthy subjects.

**Table 4**.

^{ 9 }; Wall, Woodward, Doyle, and Artes

^{ 10 }) and recent advances in reducing measurement variability of retinal structures over time (i.e., use of the baseline scan as a reference scan upon which subsequent scans are aligned to scan the exact same retinal location, averaging of scan lines during image acquisition, and better segmentation algorithms of the retinal layers; Pemp, Kardon, Kircher, Pernicka, Schmidt-Erfurth, and Reitner

^{ 11 }) all will improve the sensitivity and specificity of the CUSUM approach. In fact, the CUSUM test can be used to model progression in order to estimate how much improvements in specific measurement variability and sensitivity of a given test will improve the ability to detect disease progression. It is also anticipated that in a more realistic clinical setting (and not a prospective well-controlled patient study group as was used here), the number of patients who would be detected as showing progression of disease such as glaucoma would increase even further.

^{ 12 }).

**J. Ledolter,**None;

**R. Kardon,**None

*Economic Control of Quality of Manufactured Product*. New York, NY: Van Nostrand; 1931.

*Out of the Crisis*. Cambridge, MA: MIT Press; 1982.

*Statistical Quality Control: Strategies and Tools for Continual Improvement*. New York, NY: John Wiley; 1999.

*Introduction to Statistical Quality Control*. New York, NY: John Wiley; 2008.

*Ann Stat*. 1986; 14: 1379– 1387. [CrossRef]

*Biometrika*. 1972; 59: 539– 549. [CrossRef]

*Cumulative Sum Charts and Charting for Quality Improvement*. New York, NY: Springer; 1997.

*Invest Ophthalmol Vis Sci*. 2013; 54: 1345– 1351. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2009; 50: 974– 979. [CrossRef] [PubMed]

*Graefes Arch Clin Exp Ophthalmol*. 2013; 251: 1841– 1848. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2009; 50: 4254– 4266. [CrossRef] [PubMed]