Abstract
Purpose:
We developed and evaluated a training procedure for marking the endpoints of the ellipsoid zone (EZ), also known as the inner segment/outer segment (IS/OS) border, on frequency domain optical coherence tomography (fdOCT) scans from patients with retinitis pigmentosa (RP).
Methods:
A manual for marking EZ endpoints was developed and used to train 2 inexperienced graders. After training, an experienced grader and the 2 trained graders marked the endpoints on fdOCT horizontal line scans through the macula from 45 patients with RP. They marked the endpoints on these same scans again 1 month later.
Results:
Intragrader agreement was excellent. The intraclass correlation coefficient (ICC) was 0.99, the average difference of endpoint locations (19.6 μm) was close to 0 μm, and the 95% limits were between −284 and 323 μm, approximately ±1.1°. Intergrader agreement also was excellent. The ICC values were 0.98 (time 1) and 0.97 (time 2), the average difference among graders was close to zero, and the 95% limits of these differences was less than 350 μm, approximately 1.2°, for both test times.
Conclusions:
While automated algorithms are becoming increasingly accurate, EZ endpoints still have to be verified manually and corrected when necessary. With training, the inter- and intragrader agreement of manually marked endpoints is excellent.
Translational Relevance:
For clinical studies, the EZ endpoints can be marked by hand if a training procedure, including a manual, is used. The endpoint confidence intervals, well under ±2.0°, are considerably smaller than the 6° spacing for the typically used static visual field.
Retinitis pigmentosa (RP) is a group of heterogeneous inherited retinal disorders characterized by the degeneration of rod and cone photoreceptor cells and, in the most extreme cases, the retinal pigment epithelium (RPE). Patients first experience night blindness followed by midperipheral vision loss. Progressive constriction of the visual field eventually leads to central vision loss and, in some cases, complete blindness. While the time-scale may vary, progressive constriction of the useable field of vision is seen in all genetic forms of RP.
1–6
Traditionally, full-field electroretinogram (ERG) cone flicker, kinetic perimetry, static perimetry, and more recently multifocal (mf) ERG have been used to monitor disease progression. However, natural progression is slow relative to the variability inherent in these measures, making changes difficult to detect using conventional metrics. With the emergence of numerous new treatment strategies for RP, it is more important than ever to find ways to monitor progression that are robust but sensitive to change, relatively easy to administer, and widely available.
Optical coherence tomography (OCT) offers one possibility. The introduction of frequency domain (fd) OCT has now made it possible to visualize more clearly individual retinal layers affected by RP to follow disease progression.
7–9 In particular, research has focused on the ellipsoid zone (EZ), also called the inner segment/outer segment (IS/OS) border. This is a hyperreflective band clearly visible on OCT scans. While for some this signal is thought to come from the cilium connecting photoreceptor inner segment and outer segments,
10–11 others argue that it is due to light scattered by the mitochondria in the ellipsoids of the distal inner segment.
12–14 Regardless, disruption of this reflective EZ is a clinical marker of disease pathology.
The EZ disappears in the periphery early in the RP disease process.
15–17 Rangaswamy et al.,
18 however, showed that for the EZ to disappear, the local visual field (VF) loss had to exceed approximately 8 dB. On the other hand, the edge of the EZ corresponds to the edge of the useable VF
15; that is, it corresponds to the precipitous drop in sensitivity seen on VFs of patients with some preservation of central visual sensitivity.
Birch et al.,
19 using an experienced grader to measure the distance between endpoints (EPs; EZ width) on two scans obtained on the same day, found that 95% of all test–retest differences for EZ width were less than 0.43°, far better than the standard VF test, which has test points spaced by 6°. Further, Birch et al.
20 looked at the VF sensitivities inside and outside the EZ EPs and found that the region surrounding this edge is more sensitive in detecting progression compared to global sensitivity measures. In general, the evidence suggests that the EZ EP is a more sensitive measure than existing VF and ERG methods.
19–21
Following the edge of the EZ has distinct advantages over other measures. First, compared to conventional measures of VF and ERG, it is easier to administer and analyze. Second, it also has advantages over other measures of OCT scans. Ramachandran et al.
21 showed that following the ends of the EZ on horizontal and vertical line scans taken through the fovea was as sensitive, if not more sensitive, in detecting annual changes compared to other metrics derived from a full macular cube scan including outer nuclear layer (ONL), outer segment (OS), and RPE volume. Thus, only one or two OCT line scans must be added to routine clinical protocols.
It remains open how best to standardize the marking of the edge of the EZ. For the EZ edge points (EPs) to be a viable outcome measure in RP clinical trials, inter- and intragrader agreement must be good. To this end, we developed a training procedure based upon a written manual and a training protocol. Here, we test this procedure by training two inexperienced “graders” to mark the EPs on a set of horizontal line scans. After training, intra- and intergrader agreement was assessed on a new set of horizontal scans from 45 patients with RP. Finally, we compared the results of the manually marked EPs to EP markings on the same set of scans 1 month later.
Based upon previous experience segmenting fdOCT scans from patients with outer retinal disease, a training manual was written detailing how to mark the EP locations of the EZ (see
Supplementary Fig. S1). The manual provided instructions on how to identify and segment the OLM, EZ band, and pRPE and illustrated examples of commonly encountered ambiguities.
In brief, the manual instructs the user to: (1) Step 1 – Decide the boundary lines for 3 bands: OLM, EZ, and the pRPE. These lines do not have to be actually drawn. (2) Step 2 – Mark the location where the EZ merges with the pRPE. This is the EP. (3) Step 3 – If the ending of the EZ is ambiguous, segment the OLM, EZ, and the pRPE based on mental outlines (Step 1). (4) Step 4 – If EZ EPs are still ambiguous, refer to the ‘Marking the EZ Edge' flowchart provided on the last page of the manual.
With the help of the written manual, Grader A conducted a training session for the two inexperienced graders (B and C) using the scans from the 30 patients in training set #1. To help familiarize the new graders with the anatomy of the outer retina and to detect obvious systemic errors, during training session #1, Grader A first segmented the three borders of interest (OLM, EZ, pRPE) on a healthy control and an RP patient, while graders B and C watched. Graders B and C then segmented the three borders on the remaining 13 RP patients' scans, which covered a range of disease severity. The three graders discussed and compared segmentations at the end of this training session. The next day, grader A conducted training session #2, which discussed ambiguities in marking the EP. This time, the three graders were not required to do the full segmentation. Instead, they independently marked just the EP on the 15 patients in training set #2. The results were quantitatively analyzed and differences among the 3 graders were discussed.
To assess the degree of inter- and intragrader agreement, the signed and absolute differences of the following two measures were obtained: (1) ΔEPintra – the difference in marked EP between time 1 (EPtime1) and time 2 (EPtime2), and (2) ΔEPinter – the difference in marked EP between a grader (EPgrader) and the average EP marked by the other two graders (EPothers).
In addition, intraclass correlation coefficients (ICC) and the 95% limits of agreement using Bland-Altman plots were calculated.
The purpose of this study was to use inexperienced graders to see whether, with proper training, we could achieve good reliability. We evaluated a training procedure, based upon a manual and a training protocol, for identifying the EZ EPs. The results indicated that, with our 2-tier training system and with the guidance of a written manual, it is possible to train inexperienced graders to reliably identify the edge of the EZ in a relatively short period of time. Inter- and intragrader measurements were similar 1 month after training. The average difference in EP markings made by one grader compared to the other two graders was less than 2 μm at time 1 and less than 7 μm 1 month later, and the confidence intervals were less than 335 μm (approximately 1.2°). Likewise, the average intragrader difference in markings was typically less than 20 μm (<0.1°) and the 95% confidence intervals were approximately 304 μm (approximately 1.1°). This is not to say that further training or more experience with OCT scans will not produce superior results. As noted above, the intrarater agreement of the experienced grader was better than that of the inexperienced graders, suggesting that precision may increase with experience.
In this context, it is appropriate to ask how experience may affect the ability to detect progression. Birch et al.
19 reported a reduction in EZ width in patients with X-linked (xl) RP of 248 μm per year. Under optimal conditions with a single experienced grader, this progression rate was higher than the 95% confidence interval for EZ width test–retest differences. Thus, most patients showed significant change over 1 year. With inexperienced graders in this study, the 95% confidence interval for marking the EZ EP was comparable with the rate of progression for one EP,
19 thus reducing somewhat the sensitivity to significant progression. This estimate of variability includes, however, grader B, who by far had the weakest reliability. The confidence interval for test–retest differences of Graders A and C, on the other hand, fall well within the average rate of yearly progression.
Several aspects of our training procedure contributed to its relative success. First, in developing our manual, we reviewed OCT scans not used in the study to anticipate sources of variability. Second, we noted patterns of EZ band dropout and made use of typical RPE characteristics, such as localized reflection spots or thickening of the RPE, to decrease inter- and intragrader variability. Third, we also encouraged segmenting the full RPE and EZ band in particularly difficult cases.
Our approach has the advantage of requiring clinicians or graders, on most scans, to simply mark two points – a nasal and temporal EP – rather than to segment the entire line. In the future, automated algorithms may be able to mark these two points. However, it is likely that they will have to be checked carefully and corrected manually especially in the case of patients with severe damage where our experience indicates the algorithms have the most difficulty. To investigate the effects of disease severity with our procedure, we compared (ΔEPintra)absolute and (ΔEPinter)absolute in scans where the EZ edge fell inside the parafovea (central 2500 μm) to those where the EZ edge falls outside this region. Neither the (ΔEPintra)absolute nor the (ΔEPinter)absolute were significantly different between these two groups (2 sample t-test; (ΔEPintra)absolute, P = 0.061; (ΔEPinter)absolute, time 1: P = 0.160 and time 2: P = 0.582 ). Our results validate that in most RP patients, regardless of severity, our manually marked EP technique produces good agreement between and among graders.