Abstract
Purpose:
The purpose of this study was to identify a taxonomy of epistemic uncertainties that affect results for geographic atrophy (GA) assessment and progression.
Methods:
An important source of variability is called “epistemic uncertainty,” which is due to incomplete system knowledge (i.e. limitations in measurement devices, artifacts, and human subjective evaluation, including annotation errors). In this study, different epistemic uncertainties affecting the analysis of GA were identified and organized into a taxonomy. The uncertainties were discussed and analyzed, and an example was provided in the case of model structure uncertainty by characterizing progression of GA by mathematical modelling and machine learning. It was hypothesized that GA growth follows a logistic (sigmoidal) function. Using case studies, the GA growth data were used to test the sigmoidal hypothesis.
Results:
Epistemic uncertainties were identified, including measurement error (imperfect outcomes from measuring tools), subjective judgment (grading affected by grader's vision and experience), model input uncertainties (data corruption or entry errors), and model structure uncertainties (elucidating the right progression pattern). Using GA growth data from case studies, it was demonstrated that GA growth can be represented by a sigmoidal function, where growth eventually approaches an upper limit.
Conclusion:
Epistemic uncertainties contribute to errors in study results and are reducible if identified and addressed. By prior identification of epistemic uncertainties, it is possible to (a) quantify uncertainty not accounted for by natural statistical variability, and (b) reduce the presence of these uncertainties in future studies.
Translational Relevance:
Lowering epistemic uncertainty will reduce experimental error, improve consistency and reproducibility, and increase confidence in diagnostics.
Geographic atrophy (GA) is a debilitating eye disease affecting 5 million individuals globally with expected growth to reach approximately 9 to 10 million individuals by the year 2040.
1 GA appears as lesions which are the result of dead retinal pigment epithelium (RPE) and photoreceptor cells with closure of the underlying choriocapillaris.
2,3 The presence of these lesions in the retina can cause irreversible vision loss, and the size and location of the lesions in the macula is linked with the degree of vision loss.
4,5 The rate of progression of GA is highly variable and there is continuing research on possible factors that contribute to GA and its progression.
5
There is currently no objective, quantitative, and universally agreed model for progression.
1,5–7 A lack of consensus may be due to the unaccounted variability in many study findings, which is attributable in part to uncertainties associated with the accuracy and precision of various assessment methods.
8 Table 1 summarizes the common epistemic uncertainties that occur in the analysis of GA in research and clinical practice.
Table 1. Epistemic Uncertainties in the Analysis of GA
Table 1. Epistemic Uncertainties in the Analysis of GA
Aside from the impact on clinical diagnosis and management, uncertainty analysis is important because progression models and machine learning can be affected by data quality and human annotation errors during the course of training and parameter estimation. Some GA analytic models are hybrid approaches combining features of biophysical approaches and machine learning. These include logistic models and mixed-effects models.
Identification of epistemic uncertainties could (a) statistically quantify variability not accounted for by a regression model, and (b) provide information for reducing these uncertainties (e.g. by experimental modification, data normalization, and image preprocessing).
In the taxonomy of uncertainty, there are two broad categories of classification: aleatory uncertainty and epistemic uncertainty (
Fig. 1). Aleatory uncertainty is regarded as irreducible uncertainty and is the natural statistical variation in data and experimental studies.
9 Epistemic uncertainty is due to lack of knowledge and refers to reducible errors, such as subjective uncertainty, measurement error, and model structure uncertainty. Epistemic uncertainty can also arise due to the limitations of electronic instrumentation and corrupted data.
10,11 By identifying significant epistemic uncertainties, statistical techniques can be used to reduce their impacts on the assessment of GA.
In a previous publication by the authors, various GA progression models were evaluated in a study of model structure uncertainty.
12 Other types of epistemic uncertainty were not investigated. Subsequently, an online search revealed that epistemic uncertainty in GA assessment in age-related macular degeneration (AMD) appears to be a neglected area of research. No other publications were found on epistemic uncertainty in GA assessment using fundus autofluorescence images apart from the prior work by the authors. In the current study, we performed a taxonomic analysis to identify and categorize other sources of epistemic uncertainty. In addition, one hypothesis from the previous paper was also tested (i.e. that although the linear approximation is generally apparent and sufficient in most clinical applications, the entire process of GA progression from start to completion may actually follow a sigmoidal model).
12 The hypothesis was investigated as a subanalysis of the data in the previous paper, for subjects with a sufficient number of clinical presentations.
Measurement errors are associated with limitations and imperfections in the instrumentation, including sensor resolution, reproducibility, electronic noise, artifacts, and distortion. A warm-up time may be needed for laboratory instrumentation after a cold start, and there may be batch-to-batch differences in equipment and differences between manufacturers due to optics or electronics. All of these errors are potentially reducible.
Input errors for predictive models can be due to data entry errors (e.g. incorrect entry of dates for patient visits), transferring software data into spreadsheets (e.g. exporting RegionFinder results into another database), data corruption (e.g. issues in reading, writing, and storing data), and duplication (e.g. multiple entries pertaining to the same data point). Data quality can be checked and uncertainty is reducible with rigorous quality assurance and data cleansing procedures that systematically check for duplications, negative numbers, or impossible dates.
Epistemic uncertainties in GA assessment can propagate as error sources in the process of data acquisition, diagnosis, and model development. This results in greater variability and wider confidence intervals and therefore less confidence in testing the original experimental hypothesis. Primary sources of epistemic uncertainties are data quality, digital image processing, and data annotation errors. Other sources of epistemic uncertainty include intergrader and intragrader variability (which may be reducible by increased automation), and “model structure uncertainty” when forecasting progression of GA (reducible by selecting the correct progression model). The impact of uncertainties in data quality and annotation accuracy will affect diagnostics by human graders as well as mathematical models for progression and machine learning.
Identification and systematic treatment of specific epistemic uncertainties will assist in reducing experimental variability.
26 For example, with FAF images, improvements are possible by (1) extending speckle-noise removal by the RegionFinder segmentation software and by investigating additional filters, such as the median filter, which may be more selective in discrimination between system noise and natural granularity, (2) applying machine learning to automate lesion segmentation (reducing human subjectivity in the annotation process), and (3) increasing sample size and the number of feature measurements. These suggested enhancements could improve delineation of GA boundaries and therefore segmentation performance (i.e. improve the resolution of lesion boundaries for improved feature extraction by the human grader or by machine learning). Further reduction in epistemic uncertainty may be possible by using machine learning approaches to find new features for discrimination in the image that may not be readily apparent to human graders. This could result in greater utilization of available data and may even lead to information discovery and insights that were not previously considered.
Further research on model structure uncertainty in GA progression could progressively minimize this source of epistemic uncertainty, whereas the quest for improving model structure may also help to inform and provide clues to the nature of GA growth. The results in this study suggest that for sparse datasets from clinical presentations, a linear approximation appears reasonable for modeling GA progression whereas, given sufficient data, a sigmoidal model may also provide information with respect to GA onset and asymptotic convergence to a plateau.
In summary, epistemic uncertainties affect experimental data quality, image processing results, and data annotations. This extraneous effect can degrade the performance of human graders, mathematical models, and machine learning performance. There are many sources of uncertainty that can be reduced, especially with FAF images, if guided by the taxonomy and analysis presented in this study.
In this study, a number of sources of epistemic uncertainty have been identified in GA assessment and its progression in fundus autofluorescence images. Unlike natural statistical variability associated with experimental error or replications, epistemic uncertainties are reducible because they relate to lack of information. In particular, epistemic uncertainties can be addressed by appropriate experimental design modifications and data quality assurance.
Epistemic uncertainties can affect grader performance and can also affect mathematical models and machine learning approaches because they are both dependent on experimental data for parameter estimation. A limited retrospective case study was included on the issue of “model structure uncertainty” in GA progression and extends the results and conclusions reported in a recent study.
12
The results for the sigmoidal model are very encouraging and suggest further study as it has the advantage of providing additional information on possible onset of GA and asymptotic progression to a limiting value by extrapolation beyond the time-series data (subject to a specified level of precision). In most clinical applications, there is a limited number of patient presentations, suggesting recourse to a linear approximation for estimating the rate of progression.
In the future, the study of epistemic uncertainty is likely to be a subject of increasing interest to biostatisticians and clinicians because it relates to additional factors that are often neglected while reporting study results subject to natural statistical variability and experimental errors.