Abstract
Purpose:
To develop a structural metascore (SMS) that combines measurements from different devices and expresses them on a single scale to facilitate their long-term analysis.
Methods:
Three structural measurements (Heidelberg Retina Tomograph II [HRT] rim area, HD-Cirrus optical coherence tomography [OCT] average retinal nerve fiber layer [RNFL] thickness, Spectralis OCT RNFL global thickness) were normalized on a scale of 0 to 100 and converted to a reference value. The resultant metascores were plotted against time. SMS performance was evaluated to predict future values (internal validation), and correlations between the average grades assigned by three clinicians were compared with the SMS slopes (external validation).
Results:
The linear regression fit with the variance approach, and adjustment to a Spectralis equivalent was the best-performing approach; this was denominated metascore. Plots were created for 3416 eyes of 1824 patients. The average baseline age (± standard deviation) was 69.8 (±13.9), mean follow-up was 11.6 (±4.7) years, and mean number of structural scans per eye was 10.0 (±4.7). The mean numbers of scans per device were 3.8 (±2.5), 5.0 (±2.9), and 1.3 (±3.0) for HRT, Cirrus, and Spectralis, respectively. The metascore slopes’ median was −0.3 (interquartile range 1.1). Correlations between the average grades assigned by the three clinicians and the metascore slopes were −0.51, −0.49, and −0.69 for the first (structural measurement printouts alone), second (metascore plots alone), and third (printouts + metascore plots) series of gradings, respectively. The average absolute predictive ability was 7.63/100 (whereas 100 = entire normalized scale).
Conclusions:
We report a method that converts Cirrus global RNFL and HRT global rim area normalized measurements to Spectralis global RNFL equivalent values to facilitate long-term structural follow-up.
Translational Relevance:
Because glaucoma changes usually occur slowly, patients are often examined with different instruments during their follow-up, a method that “unifies” structural measurements provided by different devices, which could assist patients’ longitudinal structural follow-up.
The data used to develop the structural metascore was exported from the three structural devices (Heidelberg Retina Tomograph [HRT], Cirrus OCT, and Spectralis OCT) used in the Glaucoma Division of the Stein Eye Institute, University of California, Los Angeles (UCLA). They included structural scans acquired from 1993 to 2020. This study adhered to the tenets of the Declaration of Helsinki, was approved by the UCLA Human Research Protection Program, and conformed to Health Insurance Portability and Accountability Act policies.
Inclusion criteria were clinical diagnosis of chronic glaucoma (primary open-angle glaucoma, chronic angle-closure glaucoma, uveitic, pseudoexfoliative, pigmentary, steroid-induced, traumatic), and age ≥18 years. Exclusion criteria were any other causes for optic nerve or retinal abnormalities potentially affecting structural or functional status, such as proliferative diabetic retinopathy, central retinal vein occlusion, retinal detachment, and exudative age-related macular degeneration. Visual fields were performed with Humphrey Field Analyzer's Swedish Interactive Thresholding Algorithm Standard 24-2 and 30-2 strategies and a size III white stimulus (Carl Zeiss Meditec, Inc., Dublin, CA, USA). Structural devices included in this study were Heidelberg Retina Tomograph II (HRT; Heidelberg Engineering, Heidelberg, Germany), Cirrus HD-OCT (Carl Zeiss Meditec, Inc.), and Spectralis OCT (Heidelberg Engineering). Only good-quality scans were included, defined as HRT standard deviation <50 µm, Cirrus HD-OCT signal strength ≥6, and Spectralis OCT quality ≥18. Normal subjects were recruited from the research database in the Glaucoma Division, Stein Eye Institute. The enrolled normal subjects were required to have open angles, corrected visual acuity of 20/25 or better, and normal eye examination results including normal visual fields and normal ophthalmoscopic appearance of the optic nerve head.
The process of generating a metascore for each eye is described briefly as follows (
Fig. 1) and is detailed below:
1. Data normalization: Different structural measurements (HRT rim area, HD-Cirrus OCT average RNFL thickness, and Spectralis OCT RNFL global thickness) were normalized to the same scale of 0 (worse) to 100 (better). We tested two different techniques for normalization: the variance and the dynamic range approach.
2. Conversion formulae: Measurements provided by HRT, Cirrus, and Spectralis were converted to a reference device equivalent (either Cirrus or Spectralis). The resulting measurements are called metascores. Goodness of fit for the metascores calculated with three different statistical methods (univariable linear regression, calibration equation and Bland-Altman plots) was calculated with the root mean squared error for the different approaches (
Table 1).
3. Metascore plots: Metascores (vertical axis) were plotted for each included eye against time in years (horizontal axis) to provide a graphical tool. A prediction interval (range in which a future individual observation will fall, based on the model estimates) was calculated for all the metascore slopes.
4. Metascore evaluation: Metascore performance was evaluated in two ways: its accuracy in predicting itself over time (predictive ability, or internal validation), and how it compares to clinical assessment (clinical, or external validation). The methodology and number of images used is summarized in
Figure 4.
Table 1. Floor and Ceiling Values for Each of the Three Devices Used in This Study (Variance Approach)
Table 1. Floor and Ceiling Values for Each of the Three Devices Used in This Study (Variance Approach)
The diagnosis of glaucoma and detection of glaucoma progression have been traditionally based on the finding of ONH damage assessed subjectively by ophthalmoscopy or photography and by corresponding damage to the visual field assessed by automated perimetry. Clinical ONH and RNFL assessment is known to be limited by poor to fair reproducibility and by the wide variation of normal anatomy between individuals.
14 Since the advent of automated imaging devices, structural findings of the ONH have become increasingly more reproducible and objective, but there are shortcomings that need to be addressed. The time span of glaucoma follow-up period typically outlives that of rapidly evolving imaging devices. This means that oftentimes multiple structural measurements from different devices are available. Additionally, “normality” according to these devices is based on normative databases created by device manufacturers, which often do not include a wide range of ethnicities and anatomical variations. Hence, their utility is limited to patients with clinical and demographic characteristics similar to the normative databases. Also, various devices use different scanning protocols, analytical software, output scan reports, and more, all of which challenge accurate comparison of results across different scanning devices and confound detection of long-term structural changes.
The HRT uses a 670 nm diode laser to create a layered three-dimensional image. Relative topographic heights are then calculated from a reference ring (contour line) manually placed on the optic disc, after which the instrument estimates ONH stereometric parameters. The RNFL thickness measurements have been shown to have poor diagnostic accuracy in previous studies,
15 and therefore we chose HRT rim area as the structural outcome of choice for calculating our proposed metascore.
OCT is a high-resolution imaging device that uses a low coherent broadband light source from a super-luminescent diode to acquire in vivo images of the retina. It applies the principle of interferometry to interpret reflectance data from a series of multiple side-by-side A-scans combined to form a cross-sectional image. The Optic Disc Cube algorithm consists of a 1024 × 200 × 200 volume scan. Parapapillary RNFL thickness is measured along a 3.46 mm diameter measurement circle automatically placed around the optic disc (256 sampled A-scans). Spectralis OCT uses a dual-beam SD-OCT (acquisition rate of 40,000 A-scans per second), a CSLO with a wavelength of 870 nm to obtain images of ocular microstructures. It incorporates a real-time eye tracking system that couples CSLO and SD-OCT scanners to adjust for eye movements and to ensure that the same location of the retina is scanned over time.
Spectralis OCT has been widely shown to have high reproducibility
8,11 and good diagnostic accuracy in detecting glaucoma and RNFL changes.
16,17 We decided to adjust measurements of the other devices to fit its normalized scale. This methodology, however, can be applied to all other devices on the market, and measurements can be theoretically adjusted to any preferred device.
With respect to the multiple machines currently available for the acquisition of automated structural ONH measurements, several studies have explored agreement,
18–20 reproducibility,
8,12,21 and diagnostic accuracy,
22 but, to our knowledge, no method has yet been introduced to unify structural measurements provided by different scanning devices on a single scale.
Tan et al.
23 compared retinal nerve fiber layer measurements between Cirrus and Spectralis and concluded that agreement of RNFL measurement between the devices was generally good; they also found that repeatability of RNFL thickness measurements in normal participants was excellent for both OCTs. Buchser et al.
11 compared RNFL thickness measurement bias and imprecision across three SD-OCT devices (RTVue-100, Cirrus HD-OCT, and 3D OCT-1000), concluding that RNFL thickness measurements showed higher imprecision (or higher measurement variability) for the RTVue-100 than the Cirrus HD-OCT and 3D OCT-1000 devices’ measurements.
Leite et al.
22 assessed diagnostic accuracy and agreement
18 of RNFL thickness measurements among RTVue, Cirrus, and Spectralis OCTs and stated that, although the spectral-domain OCTs had different resolution and acquisition rates, their ability to detect glaucoma based on areas under the curve (AUCs) and sensitivities at fixed specificities of 80% and 95% was similar. With respect to agreement, they concluded that RNFL thickness measurements obtained by different SD-OCT instruments were not entirely compatible (probably attributable to differences in RNFL detection algorithms) and should therefore not be used interchangeably. Fanihagh et al.
9 explored correlations and strength of association of RNFL thickness in glaucoma patients among OCT, scanning laser polarimetry and CSLO; they reported a high correlation in RNFL thickness between OCT and scanning laser polarimetry, while HRT's (CSLO) topographic measurements (RNFL) displayed poor correlations with the other two imaging devices. Lally et al.
25 combined structural measurements from multiple imaging devices as inputs for machine learning classifiers as to see if this would improve discriminating ability between healthy and glaucomatous eyes, concluding that combining data from multiple devices did not significantly improve discriminating ability (Lally DR, et al.
IOVS. 2009;50:5817).
Our metascore approach aids detection of structural change. Given the large number of structural-measuring devices available on the market, the velocity at which they are being introduced to clinical practice, and the fact that glaucoma is mostly a slow progressing disease which requires life-long clinical examinations, patients are often examined with several different instruments during their lifetime. A method that puts structural measurements provided by different devices on a same scale for their sequential interpretation would be valuable to assist clinicians’ interpretation of change over long follow-up periods that include diverse devices’ measurements. We believe this tool would increase the relative weight of the structural components of data in decision making about treatment, since it can provide a robust long-term trend and rate. Of course, all decisions must be made in the context, and with integration, of all other relevant clinical data such as severity of the disease, patients’ wishes, expected longevity, etc.
In the clinical validation of our metascore, we observed that the correlation between the specialists’ gradings and the metascore slopes decreased when the metascore plots were analyzed alone, but improved when they were reviewed together with the structural devices’ printouts (
Fig. 10). This suggests that the metascore might be helpful as an additional tool for structural progression analysis but may not necessarily replace the analysis of structural raw data provided by the devices’ printouts. Agreement between graders improved in one out of three combinations of graders (B&C) when the printouts were analyzed together with the metascore plots (as opposed to the printouts alone) and decreased for the other two pairs of graders (A&B and A&C). We attribute this to the subjectivity of interpreting a novel method, and the fact that there was no consensus training before the grading. Regarding the metascore slopes, we obtained an overall negative trend (mean and median −0.3), which is to be expected, considering glaucomatous progression. Nevertheless, we also obtained some “positive” slopes that can be attributed either to noise and variability (property of all ancillary tests), or to actual structural changes.
24
Our study has limitations. We used data from the structural devices used at our institution, which does not include other commercially available devices. Our metascore includes global measurements (such as RNFL thickness and rim area) and does not account for different localized or regional changes only, or for stages of glaucoma. The implementation of our methods requires a significant amount of work to pull out the relevant data from the corresponding devices. Structural scans were not filtered for segmentation errors (nor other scan artifacts), which might have resulted in some unreliable scans being included in the metascore slopes. Regarding our 1-10 clinical validation scale, it's worth mentioning that by not being externally calibrated, the scores might have included unequal steps, hence, presenting a limitation in the averaged graders’ scores shown in the results. The metascore has been internally and externally evaluated with its predictive ability and clinicians’ validation, respectively. We did not include an objective external reference standard for a similar approach to a combined structural measure, because we believe none are currently available. It is true that the generating the metascore on a different population may yield different coefficients in the model; we plan this as additional work in large datasets. Ultimately the utility of the technique will rest on more widespread use. Finally, its retrospective design and performance at a tertiary care center may produce results that are not entirely generalizable to other populations. Future work will include optic disc photographs with the purpose of incorporating additional structural data to the structural “metascore” and would address even longer follow-up periods.
To conclude, the capability of imaging instruments to provide additional information to the traditional examination improves the detection of glaucoma and its progression. Our aim is to combine structural measurements provided by different rapidly-evolving, commercially available measuring devices in order to achieve a reliable tool with which to gauge glaucomatous structural progression in patients with long follow-up that spans the use of several, evolving imaging methods. Specifically, we report a method that converts HRT rim area and Cirrus RNFL measurements to Spectralis global RNFL equivalent, normalized values, so that they can be evaluated on a single scale to facilitate analysis and interpretation of long-term structural data in glaucomatous eyes.
Disclosure: A. De Gainza, None; E. Morales, None; A. Rabiolo, None; F. Yu, None; A.A. Afifi, None; K. Nouri-Mahdavi, None; J. Caprioli, None