Box 1. Case Example: AlzEye—Linking Ophthalmic Imaging and Systemic Disease Labels at Scale to Provide New Insights into Dementia (and Cardiovascular Disease)
Box 1. When trying to achieve the necessary scale of data for machine learning approaches, the use of routinely collected data is an attractive alternative to the high-cost, researcher-led data sets compiled through epidemiologic studies or biobanks. One of the aims of such an approach is to create virtual biobanks much cheaper than otherwise possible (arguably a “biobank-on-a-shoestring”) and which may indeed better reflect the population of interest (vs. the somewhat skewed population that has been observed in some biobank programs).
An example of this kind of approach is AlzEye, the United Kingdom's first and largest linkage of complex three-dimensional imaging data (fundus photographs and retinal OCT) to systemic health diagnostic codes for the purposes of exploring retinal ultrastructural associations and predictors of dementia and its subtypes. AlzEye depends on the combination of both local and nationally held data sets within the United Kingdom's National Health Service (NHS). Specifically, AlzEye is a pseudonymized data set linking retinal photographs and OCT scans of all patients older than 40 years attending Moorfields Eye Hospital NHSFT with Hospital Episode Statistics (HES), a national database consisting of all admissions, emergency attendances, and outpatient appointments in England. The appropriate use and linkage of such data depend on satisfying many criteria, including ethical approval, data security, and governance. Engagement with the public has been pivotal to the approach. We surveyed 483 participants to canvass public opinion on the use of eye scans for research and the acceptability of large data sets to identify patterns of systemic disease. Two members of the public sit on the AlzEye working group, and information regarding the study is outlined on the funding charity's website.
This kind of study is complex, and the approval process that AlzEye underwent was appropriately robust with a number of different approvals required prior to the establishment of AlzEye. Although the exact process will vary from country to country, the processes are likely to share similar principles, and we therefore highlight them here. The first stage required us to secure a research sponsor, necessitating institutional approval consisting of research and development, information governance, and information technology at both the NHS data custodian (Moorfields Eye Hospital NHSFT) and the research institute (University College London). Important conditions involving third-party linkage by a “trusted third party,” robust data privacy measures, and sufficient computing infrastructure were outlined at this stage. In AlzEye, the linkage process is as follows: (1) images from Moorfields Eye Hospital are pseudonymized through the removal of all identifiers and replacement with a unique study ID. These are then transferred to University College London. (2) Simultaneously, a spreadsheet of the image identifiers (date of birth, unique NHS number, sex) is securely sent to NHS Digital, the national body overseeing the HES data warehouse. (3) NHS Digital strips the identifiers and returns the relevant HES data with pseudonymized study IDs to University College London, where it is linked with corresponding images. Thus, HES data never enter the source of imaging data (Moorfields Eye Hospital), and conversely, identifiers never enter University College London (Fig. 1).
Prior to commencement, all research studies in the United Kingdom require ethical approval through the Research Ethics Service, but some specific studies may warrant additional approvals. AlzEye was approved by the National Health Service Research Ethics Committee in 2018. Due to the large number of patients included (more than 250,000), the historical nature of the data, and the advanced age and difficulty in contacting patients, it would not be feasible to obtain consent from patients. Therefore, to use identifiable data for the linkage, a specific type of approval was sought involving an application to the Confidential Advisory Group, who advise the UK Health Research Authority on whether sufficient justification exists to access data without consent. In the United Kingdom, this is known as a “Section 251 approval,” deriving from the 2006 NHS Act, which provides provision for this kind of application. The Health Research Authority, collating the opinions of the respective committees, granted ultimate approval in late 2018.
Upon these approvals, applications to NHS Digital for the procurement of HES data can then be processed. In addition to the external approvals, NHS Digital has its own internal approval process detailing, in particular, the legal basis upon which data are being accessed. When a given application is approved, it is then presented on behalf of the applicant by NHS Digital to the Independent Group Advising on the Release of Data (IGARD), a committee of specialist and lay members who assess all applications to NHS Digital for the dissemination of confidential information. In January 2019, IGARD gave approval, remarking that the AlzEye application “could be used as an exemplar to help other researchers with their applications to the Data Access Request Service.”