**Purpose**:
To develop a simulation model for glaucomatous longitudinal visual field (VF) tests with controlled progression rates.

**Methods**:
Longitudinal VF tests of 1008 eyes from 755 patients with glaucoma were used to learn the statistical characteristics of VF progression. The learned statistics and known anatomic correlations between VF test points were used to automatically generate progression patterns for baseline fields of patients with glaucoma. VF sequences were constructed by adding spatially correlated noise templates to the generated progression patterns. The two one-sided test (TOST) procedure was used to analyze the equivalence between simulated data and data from patients with glaucoma. VF progression detection rates in the simulated VF data were compared to those in patients with glaucoma using mean deviation (MD), cluster, and pointwise trend analysis.

**Results**:
VF indices (MD, pattern standard deviation), MD linear regression slopes, and progression detection rates for the simulated and patients’ data were practically equivalent (TOST *P* < 0.01). In patients with glaucoma, the detection rates in 7 years using MD, cluster, and pointwise trend analysis were 24.4%, 26.2%, and 38.4%, respectively. In the simulated data, the mean detection rates (95% confidence interval) for MD, cluster, and pointwise trend analysis were 24.7% (24.1%–25.2%), 24.9% (24.2%–25.5%), and 35.7% (34.9%–36.5%), respectively.

**Conclusions**:
A novel simulation model generates glaucomatous VF sequences that are practically equivalent to longitudinal VFs from patients with glaucoma.

**Translational Relevance**:
Simulated VF sequences with controlled progression rates can support the evaluation and optimization of methods to detect VF progression and can provide guidance for the interpretation of longitudinal VFs.

^{1}Visual field (VF) testing, or perimetry, is the standard clinical assessment of functional vision integrity in glaucoma. In VF testing, differential light sensitivity (DLS) at discrete locations in the central and peripheral regions of the retina is estimated based on the patient's responses to visual stimuli. Because of the subjective and probabilistic nature of determining psychophysical thresholds, VF measurements can significantly vary from test to test.

^{2}

^{–}

^{4}Timely detection of VF deterioration is essential for the management of patients with glaucoma as it impacts decisions regarding treatment and follow-up strategies.

^{5}However, due to the high variability of VF test results, accurate detection of VF progression remains challenging.

^{5}

^{,}

^{6}

^{5}

^{–}

^{11}However, the lack of consensus on the gold standard to detect VF progression in glaucoma makes it challenging to evaluate the performance of these methods.

^{6}

^{,}

^{12}To this end, developing simulation models that generate longitudinal VF data with known progression rates will support the evaluation and optimization of methods to detect VF progression.

^{12}

^{4}derived empirical models based on a large-scale longitudinal VF data set. In their model, VF variability for a specific DLS is modeled by the distribution of residuals from pointwise linear regression. This nonparametric method has the benefit of not depending on prior assumptions regarding the statistics of the residuals. However, independent pointwise noise sampled from the empirical distributions was used in their simulation to generate the simulated VF tests.

^{13}This makes the implicit assumption that noise at different VF test points is uncorrelated, resulting in the underestimation of the variability of VF global indices (e.g., mean deviation [MD]).

^{13}

^{,}

^{14}For this reason, Wu and Medeiros

^{14}developed noise templates that account for spatially correlated VF noise. Their improved model had better agreement with the statistical characteristics of noise in longitudinal VF data of patients with glaucoma.

^{13}

^{,}

^{14}did not use explicit progression models. Instead, they fitted models to longitudinal VF sequences from patients to generate estimates of VF loss at different time points. Using such methods, it is difficult to differentiate the effects of noise and actual VF progression on the fitted model. In a separate effort, Gardiner and Crabb

^{15}and Gardiner et al.

^{16}developed a model with controlled progression rates to simulate longitudinal VF data, which enables quantitative assessment of methods to detect VF progression. However, this method has some limitations. First, their simulation is based on a limited number of manually defined baseline tests, which may not be sufficient to represent the wide range of VF defects in patients with glaucoma. Second, they assume that VF deterioration occurs in only one anatomically correlated region in the field. This assumption is restrictive and not consistent with clinical observations.

^{17}

^{–}

^{19}

^{12}

^{–}

^{16}to develop a data-driven model to simulate glaucomatous longitudinal VF data with controlled progression rates. Our model uses statistical characteristics learned from longitudinal VF data of patients with glaucoma to automatically generate patterns of VF progression in several regions of the retina. To improve the description of spatially correlated noise in longitudinal VF tests, we used a unified pool of noise templates that was generated from VF tests of patients with glaucoma to account for spatially correlated noise in VF tests. The resulting model can automatically generate longitudinal VF sequences with different progression patterns from any baseline VF. Thus, it enables the generation of a large longitudinal VF data set with known progression rates to support the quantitative evaluation and optimization of VF progression detection methods.

^{9}

^{,}

^{20}the Canadian Glaucoma Study (CGS) VF data set,

^{21}and a longitudinal VF data set from the glaucoma clinic of Toronto Western Hospital (TWH). All VF tests in the three data sets were performed with the Humphrey Field Analyzer (Carl Zeiss Meditec, Dublin, CA, USA) with the 24-2 or 30-2 test patterns, using either Full Threshold (Rotterdam and CGS) or SITA Standard (TWH) algorithms. In this study, VFs that were tested with the 30-2 pattern were converted to the 24-2 pattern by keeping the common test locations. Patients with a diagnosis of primary open-angle glaucoma, normal-tension glaucoma, and chronic angle-closure glaucoma who had been followed for at least 5 years (including ≥10 reliable VF tests for each eye) were included in the study. Based on previous work,

^{11}

^{,}

^{14}

^{,}

^{22}

^{,}

^{23}the criteria for reliable VF tests were set to fixation losses and a false-negative rate <33% and a false-positive rate <20%. This study was approved by the research ethics board of the University Health Network, Toronto, Ontario, Canada, and was conducted in adherence with the Declaration of Helsinki.

^{16}VF progression was simulated to occur in clusters of VF points with high anatomic correlation. The anatomic correlation was measured by Garway–Heath angles

^{24}(i.e., the angles of optic nerve fiber bundles underneath VF points entering the optic disc). In the simulation, a VF test point was first specified as the “progression center.” Then, the corresponding progression cluster could be uniquely determined by the progression center and VF points that have Garway–Heath angles lying within ±10° of the progression center.

*n*progression clusters in patients with glaucoma. As eyes with baseline scotomas are more likely to show VF progression than those without VF defects, the empirical cumulative distributions of progression clusters for eyes with and without baseline scotomas were computed separately. Similar to other studies,

^{25}

^{,}

^{26}a baseline scotoma was defined as three or more contiguous points in the baseline VF test with a pattern deviation probability of <5%. Since the cumulative distribution function is monotonic over the range of 0 to 1, a random number between 0 and 1 was used to automatically select the number of simulated progression centers for each baseline VF test. Therefore, the number of simulated progressions shares the same statistical characteristics as those observed in patients with glaucoma. If the selected number of progression centers equals 0, the simulated eye remains stable. Figure 2 shows the empirical cumulative distribution of progression clusters in VF sequences of patients with glaucoma with (Fig. 2a) and without (Fig. 2b) baseline scotoma.

^{15}

^{,}

^{27}The progression rate is a hyperparameter in our proposed model and can be set to any desired value. To facilitate comparisons between simulated VF data and data from patients with glaucoma, instead of assigning a uniform progression rate to all progression clusters, the progression rate for each cluster was randomly selected from the probability distribution of the mean progression rates of clusters at the same location in patients with glaucoma.

^{14}to collectively describe the short- and long-term VF variability.

^{28}First, pointwise linear regression models were fitted to VF sequences from stable eyes in our data sets. The fitted values were treated as the “true sensitivity,” and the residuals were taken as noise or variability in the field. Residual values associated with the same true sensitivity were then pooled across locations and patients to construct empirical distributions of VF variability for different sensitivity values. Next, to construct noise templates, residuals at each location (in dB) were mapped to probability scores (0–1) based on their percentile ranks in the empirical distribution of VF variability for the true sensitivity. The probability scores of all locations in a VF collectively constitute the noise template for this field, representing the normalized magnitude and spatial patterns of noise. By allowing for spatial correlations between noise at different test points, noise templates can describe global visit effects

^{29}(e.g., fatigue) that affect the DLS measurements at several VF points during the test. Unlike the study by Wu and Medeiros,

^{14}where patient-specific noise templates were used (i.e., creating different noise template pools for each patient), we pooled noise templates across all patients and used them interchangeably. Pooling the noise templates from different patients may offer several advantages for the simulation, which are explained in the Discussion section. As glaucomatous VF measurements in superior and inferior hemifields are commonly assumed independent,

^{30}we vertically flipped each obtained noise template around the horizontal meridian to augment the number of templates in the pool. Furthermore, we chose to use stable VF sequences to derive the residuals. As the primary source of variation in stable VF tests is measurement noise, the impact of nonlinear VF progression in estimating variability can be minimized by using stable eyes. In this study, a stable VF sequence was defined as a sequence with MD linear regression slope falling in the range of ±0.5 dB/y.

^{10}

^{,}

^{11}statistically significant MD deterioration is defined as a negative MD linear regression slope with

*P*< 0.05. With the same criteria, sensitivity analysis was used to compare the detection rates in three sets of simulated longitudinal VF data. These three simulated VF data sets were generated using (A) the proposed model (i.e., multiple progression centers with noise templates), (B) multiple progression centers with independent pointwise noise, and (C) a single progression cluster with noise templates. Furthermore, we compared the detection rates in simulated and patients’ VF data using two other trend-based methods: cluster and pointwise trend analysis. In these comparisons, we used the detection criteria proposed by Gardiner et al.

^{11}(i.e., for cluster trend analysis: ≥3 clusters with negative mean cluster slope with

*P*< 0.117; for pointwise trend analysis: ≥9 points with negative slopes with

*P*< 0.138). These criteria were derived with a test–retest data set so that each detection method could achieve 95% specificity in detecting VF progression.

^{11}

^{31}was used to analyze the equivalence between simulated VF data and data from patients with glaucoma. In the TOST, two composite null hypotheses (

*H*

_{01}and

*H*

_{02}) are tested:

*H*

_{01}: µ

_{1}− µ

_{2}≥ Δ

_{U}and

*H*

_{02}: µ

_{1}− µ

_{2}≤ Δ

_{L}, where µ

_{1}and µ

_{2}are the mean values of parameters that describe the simulated and patients’ data, respectively, and Δ

_{U}and Δ

_{L}are the upper and lower bounds reflecting the clinical tolerance for equivalence. If both null hypotheses can be rejected (

*P*< 0.05), µ

_{1}and µ

_{2}are considered practically equivalent. In this study, the equivalence bounds were set to ±0.5 dB for MD and PSD, ±0.1 dB/y for MD slope, and ±4% for detection rate. The selection of equivalence bounds for MD and PSD (±0.5 dB) is based on the clinically acceptable levels for prediction errors.

^{22}For MD slope, the equivalent bounds were set to the expected progression rate due to aging (−0.1 dB/y). For detection rate, the bounds were determined based on the change in detection rate when using the upper and lower bounds for MD slope as the detection criteria (with MD trend analysis). Moreover, we adopted one-sided Mann–Whitney

*U*tests instead of one-sided

*t*-tests to deal with the nonnormality of our data. Levene's test for equality of variance was used to analyze differences between the variance of MD linear regression residuals in simulated VF data and data from patients with glaucoma.

*P*< 0.001). Furthermore, the same comparisons were carried out in different severity groups: the mild (initial MD >−6 dB), moderate (initial MD ∈ [ − 12, − 6] dB), and severe (initial MD < −12 dB) groups. The simulation results demonstrated good agreement with those from patients with glaucoma in the different groups (TOST

*P*< 0.01). A detailed comparison can be found in Table 2.

^{14}there was no significant difference between the standard deviation of MD linear regression residuals for VF data of patients with glaucoma and simulated data when noise templates were used to model VF variability (Figs. 3a, 3b). By contrast, the standard deviation of MD residuals when using the independent noise model for each VF test point was significantly lower (Levene's test

*P*< 0.0001, Fig. 3c).

*P*< 0.0001). Specifically, the mean (95% confidence interval [CI]) detections rates after 3, 5, and 7 years from the baseline were 7.7% (7.3% − 8.0%), 17.1% (16.4% − 17.8%), and 24.7% (24.1% − 25.2%), respectively. In comparison, the mean (95% CI) detection rates at the same time points for the simulated data with independent pointwise noise (set B) were 20.5% (19.7% − 21.3%), 26.9% (26.1% − 27.8%), and 40.7% (39.7% − 41.7%). When assuming the simulated VF progression occurs in only one cluster of points (set C), the mean (95% CI) detection rates were 2.2% (2.0% − 2.5%), 6.3% (6.0% − 6.7%), and 11.7% (11.3% − 12.2%), respectively.

*P*< 0.01). Specifically, when using cluster trend analysis for data from patients with glaucoma, the detected rates were 9.3%, 18.4%, and 26.2% after 3, 5, and 7 years from the baseline, respectively. In comparison, the mean (95% CI) detection rates using cluster trend analysis with simulated VF data generated by our model were 10.2% (9.5% − 11.0%), 18.7% (18.0% − 19.5%), and 24.9% (24.2% − 25.5%) at the same time points. For pointwise trend analysis, the detected rates in patients with glaucoma were 17.3%, 28.1%, and 38.4% after 3, 5, and 7 years, while the detection rates with the simulated VF data were 18.9% (18.3% − 19.6%), 29.4% (28.6% − 30.2%), and 35.7% (34.9% − 36.5%) at the same time points, respectively.

^{32}For each progression rate, the average MSEs between the patient's VF tests at follow-up visits and the 100 simulated VF sequences were calculated. To construct the 95% CI for the MSE, the above process was repeated 10 times. Figure 6 shows the cumulative sum of average MSEs between the simulated VFs for the four different progression rates and the patient's VFs over 7 years. After the first year, the MSEs between the simulated and patient's VF data for all four progression rates are not significantly different. After 2 years, the cumulative MSEs between simulated VF sequences with progression rates ≤−0.5 dB/y and the patient's data are significantly lower (

*t*-test

*P*< 0.01) than the cumulative MSEs with simulated stable VFs (0 dB/y in MD slope). In other words, the data in Figure 6 suggest that after 2 years, the patient's VFs are deteriorating at a rate equal to or worse than −0.5 dB/y. After 3 years, the cumulative MSEs between the patient's data and simulated VFs with moderate and fast progression rates (−0.5 dB/y and −1.0 dB/y in MD slope) are significantly lower (

*t*-test

*P*< 0.01) than the MSEs with simulated VFs that are either stable (0 dB/y in MD slope) or progressing at a very fast rate (−1.5 dB/y in MD slope). After 4 years, Figure 6 shows that the cumulative MSEs between the simulated, moderately deteriorating VFs (i.e., −0.5 dB/y in MD slope) and the patient's data are significantly lower (

*t*-test

*P*< 0.01) than the cumulative MSEs with all the other three simulated sequences. Note that after 4 years, the estimated VF progression rate (−0.5 dB/y) is the same as the clinical determination (moderate progression) obtained by measuring the MD slope over the 7 years. Moreover, even before the data in Figure 6 converged to a single solution in year 4, one could get insights into the range of progression rates that are consistent with the measured VFs at each time point.

*P*< 0.01).

^{14}The rationale for this change is that (a) our method does not require a patient's entire VF history for simulations. Thus, it may be able to simulate the projected trajectory of any patient from a single baseline test. (b) It may be able to better capture the statistics of the global visit effects by pooling across all eyes. (c) Patient-specific templates require shuffling within the same eccentricity for data augmentation, which may not be entirely consistent with anatomic correlations between VF points.

^{24}Nevertheless, we compared the results of using unified versus patient-specific noise templates. Both methods demonstrated similar capabilities in characterizing the MD of patients with glaucoma. Specifically, the MD variabilities (measured by the standard deviation of MD linear regression residuals

^{14}) were 1.4 dB, 1.3 dB, and 1.3 dB for simulated data using patient-specific templates, simulated data using our unified pool, and data from patients with glaucoma, respectively.

*P*< 0.05), even when the differences are not clinically significant. To overcome this problem, we used the equivalence test (i.e., the TOST procedure) to show that differences between data generated by our model and data from patients with glaucoma lie between clinically acceptable bounds for equivalence. The result of the TOST procedure demonstrated that all VF indices for the simulated and patient data are practically equivalent (

*P*< 0.01).

^{33}

^{–}

^{36}However, our model can be easily extended to incorporate nonlinear models of VF progression. Moreover, to achieve a large sample size, we mixed VF data sets tested using SITA Standard and Full Threshold algorithms, which may result in different noise characteristics due to postprocessing in the SITA Standard. Finally, our model assumes that the disease progresses without interruptions and does not explicitly account for clinical interventions, which can slow down the progression.

**Y. Li**, None;

**M. Eizenman**, None;

**R.B. Shi**, None;

**Y.M. Buys**, None;

**G.E. Trope**, None;

**W. Wong**, None

*JAMA*. 2014; 311(18): 1901–1911. [CrossRef] [PubMed]

*Am J Ophthalmol*. 1989; 108(2): 130–135. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41(2): 417–421. [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(10): 5985–5990. [CrossRef] [PubMed]

*Prog Brain Res*. 2015; 221: 135–158. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2003; 44(9): 3873–3879. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2008; 92(4): 569–573. [CrossRef] [PubMed]

*Ophthalmology*. 2012; 119(3): 458–467. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2013; 54(10): 6694–6700. [CrossRef] [PubMed]

*PLoS One*. 2014; 9(1): e85654. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2017; 58(6): BIO180–BIO190. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41(8): 2192–2200. [PubMed]

*PLoS One*. 2013; 8(12): e83595. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2018; 7(3): 22. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2002; 43(5): 1400–1407. [PubMed]

*Vis Res*. 2004; 44(8): 839–848. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53(4): 2390–2394. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2007; 48(4): 1642–1650. [CrossRef] [PubMed]

*J Am Stat Assoc*. 2019; 114(527): 1063–1074. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(4): 2350–2357. [CrossRef] [PubMed]

*Can J Ophthalmol*. 2006; 41(5): 566–575. [CrossRef] [PubMed]

*Ophthalmology*. 2017; 124(11): 1612–1620. [CrossRef] [PubMed]

*Ophthalmology*. 1997; 104(7): 1126–1130. [CrossRef] [PubMed]

*Ophthalmology*. 2000; 107(10): 1809–1815. [CrossRef] [PubMed]

*Clinical Decisions in Glaucoma*. St. Louis, MO: Mosby; 1993.

*Am J Ophthalmol*. 2006; 141(3): 463–471. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2007; 91(10): 1257–1258. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2022; 11(8): 3. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2015; 56(8): 4283–4289. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1992; 110(6): 812–819. [CrossRef] [PubMed]

*J Pharmacokinet Biopharm*. 1987; 15(6): 657–680. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(7): 4135–4143. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41(7): 1774–1782. [PubMed]

*Invest Ophthalmol Vis Sci*. 2013; 54(8): 5505–5513. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55(12): 7881–7887. [CrossRef] [PubMed]

*JAMA Ophthalmol*. 2016; 134(5): 496–502. [CrossRef] [PubMed]