Purchase this article with an account.
Lauren N. Ayton, Joseph F. Rizzo, Ian L. Bailey, August Colenbrander, Gislin Dagnelie, Duane R. Geruschat, Philip C. Hessburg, Chris D. McCarthy, Matthew A. Petoe, Gary S. Rubin, Philip R. Troyk, for the HOVER International Taskforce; Harmonization of Outcomes and Vision Endpoints in Vision Restoration Trials: Recommendations from the International HOVER Taskforce. Trans. Vis. Sci. Tech. 2020;9(8):25. doi: https://doi.org/10.1167/tvst.9.8.25.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Translational research in vision prosthetics, gene therapy, optogenetics, stem cell and other forms of transplantation, and sensory substitution is creating new therapeutic options for patients with neural forms of blindness. The technical challenges faced by each of these disciplines differ considerably, but they all face the same challenge of how to assess vision in patients with ultra-low vision (ULV), who will be the earliest subjects to receive new therapies.
Historically, there were few tests to assess vision in ULV patients. In the 1990s, the field of visual prosthetics expanded rapidly, and this activity led to a heightened need to develop better tests to quantify end points for clinical studies. Each group tended to develop novel tests, which made it difficult to compare outcomes across groups. The common lack of validation of the tests and the variable use of controls added to the challenge of interpreting the outcomes of these clinical studies.
In 2014, at the bi-annual International “Eye and the Chip” meeting of experts in the field of visual prosthetics, a group of interested leaders agreed to work cooperatively to develop the International Harmonization of Outcomes and Vision Endpoints in Vision Restoration Trials (HOVER) Taskforce. Under this banner, more than 80 specialists across seven topic areas joined an effort to formulate guidelines for performing and reporting psychophysical tests in humans who participate in clinical trials for visual restoration. This document provides the complete version of the consensus opinions from the HOVER taskforce, which, together with its rules of governance, will be posted on the website of the Henry Ford Department of Ophthalmology (www.artificialvision.org).
Research groups or companies that choose to follow these guidelines are encouraged to include a specific statement to that effect in their communications to the public. The Executive Committee of the HOVER Taskforce will maintain a list of all human psychophysical research in the relevant fields of research on the same website to provide an overview of methods and outcomes of all clinical work being performed in an attempt to restore vision to the blind. This website will also specify which scientific publications contain the statement of certification. The website will be updated every 2 years and continue to exist as a living document of worldwide efforts to restore vision to the blind.
The HOVER consensus document has been written by over 80 of the world's experts in vision restoration and low vision and provides recommendations on the measurement and reporting of patient outcomes in vision restoration trials.
Legally blind—Depends on the defining organization. WHO defines legally blind as 20/400 or worse in the better eye and/or a field of view smaller than 20 degrees.
Count fingers (CF)—Individuals can tell how many fingers the ophthalmologist is holding up.
Hand motion (HM)—Individuals can tell that the ophthalmologist is waving a hand in front of their eyes.
Light perception (LP)—Individuals can tell if the lights in a room are on or off. Roughly equivalent to a normally sighted individuals perception with their eyes closed, and generally assessed using a bright light at between 40 cm and 1 m.
No light perception (NLP)—Individuals cannot tell if the lights in a room are on or off. Generally assessed using a bright light at between 40 cm and 1m.
Two-interval, forced choice—Two temporal intervals occur (generally signaled by an auditory cue), and subjects are asked which interval contained a particular percept, such as which interval contained a phosphene, which interval contained the larger percept, or which interval contained the brightest percept. The advantage of this method is that it avoids subject criterion effects. Disadvantages are that, because there are two intervals, chance performance is 50%, so a fairly large amount of data must be collected to find an accurate threshold.
n-Interval, forced choice —This is similar to the two-alternative forced choice, but the subject is asked which of three or more intervals has the brightest (for example) stimulus. Because chance performance is now 33%, this method is considerably more efficient (even though there are now three intervals). Because there is a slightly larger memory component, it may not be suitable for subjects with memory loss or cognitive difficulties.
Two-alternative, forced choice —A single stimulus is presented and subjects must report whether (for example) whether or not a stimulus was presented or whether there were one or two stimuli. This is an efficient method, but it is susceptible to subject bias; for example, one subject may say there is a single phosphene unless they were confident there were two distinct phosphenes. A different subject (or the same subject on a different day) might report two phosphenes whenever they see a complex shape. Thus, the same perceptual experience might result in very different patient reports. In the case of detection tasks, catch trials (in which a null stimulus is presented at random intervals) should be used in 10% to 20% of the total number of trials.
Rating —A classical brightness rating procedure was described by Stevens.55 Subjects are first presented with a visual stimulus with an agreed reference brightness (e.g., 10) and then asked to numerically rate the brightness of a second stimulus in relation to the first. For example, a subject would assign a value of 20 if the second stimulation appeared to be twice as bright as the initial percept. Stimuli should always be presented in a random order. The reference stimulus need not be provided every trial but should be provided regularly, such as at the start of the session and perhaps at a minimum of every five trials. The reference can also be included as a member of the test set, as this provides a useful way of assessing subject rating accuracy. Subjects show surprising reliability on this task across an incredibly wide variety of domains,55 including rating the brightness44,56 and size44 of phosphenes.
Method of constant stimuli —The observer is presented with a fixed, predetermined set of stimuli of which some are above and others are below threshold. The stimulus set is presented in a random order. Advantages are that this method prevents the observer from being able to predict what the next stimulus will be and minimizes the effect of fatigue on estimated thresholds. One disadvantage is that this approach is time consuming, especially when the range between the above and below thresholds is not well known so a large number of stimulus intensities must be included.
Staircase —In staircase procedures, stimulus intensity (e.g., current amplitude) is adaptively increased for incorrect responses and decreased if there is a series of consecutively correct responses. This provides an efficient way to focus trials on stimulus intensities that are near threshold. The number of consecutively correct responses that are required to decrease the stimulus intensity determines the point on the psychometric function (describing probability of detection as a function of current amplitude) that is targeted. For example, the 1-up/2-down variant of the transformed up/down method61 will converge toward presenting current amplitudes that result in a detection performance of 71%.
ML staircases —These methods use maximum likelihood algorithms to select the stimulus intensity for each trial that is expected to provide the maximal amount of information about the threshold, given the previous history of trials. Although highly efficient in theory, keypress errors early in the staircase can result on the staircase taking a very long time to converge or, if the number of trials is limited, converging to an incorrect threshold. Methods susceptible to keypress errors are best used with highly experienced and reliable observers.
This PDF is available to Subscribers Only