Abstract
Purpose:
Clinical trials for remyelination in multiple sclerosis (MS) require an imaging biomarker. The multifocal visual evoked potential (mfVEP) is an accurate technique for measuring axonal conduction; however, it produces large datasets requiring lengthy analysis by human experts to detect measurable responses versus noisy traces. This study aimed to develop a machine-learning approach for the identification of true responses versus noisy traces and the detection of latency peaks in measurable signals.
Methods:
We obtained 2240 mfVEP traces from 10 MS patients using the VS-1 mfVEP machine, and they were classified by a skilled expert twice with an interval of 1 week. Of these, 2025 (90%) were classified consistently and used for the study. ResNet-50 and VGG16 models were trained and tested to produce three outputs: no signal, up-sloped signal, or down-sloped signal. Each model ran 1000 iterations with a stochastic gradient descent optimizer with a learning rate of 0.0001.
Results:
ResNet-50 and VGG16 had false-positive rates of 1.7% and 0.6%, respectively, when the testing dataset was analyzed (n = 612). The false-negative rates were 8.2% and 6.5%, respectively, against the same dataset. The latency measurements in the validation and testing cohorts in the study were similar.
Conclusions:
Our models efficiently analyze mfVEPs with <2% false positives compared with human false positives of <8%.
Translational Relevance:
mfVEP, a safe neurophysiological technique, analyzed using artificial intelligence, can serve as an efficient biomarker in MS clinical trials and signal latency measurement.
Two image-based models were tested: ResNet-50
10 and VGG16.
11 An image-based model was used, as processing the two-dimensional (image) shapes of the mfVEPs closely represents the assessment done by human experts when determining signal versus no signal traces. The models have been proven to be effective in many image classification challenges in the computer science and medical fields. These models were loaded with pre-trained weights leveraging their image recognition detection network.
In this study, each trace was converted to a black-and-white image with a resolution of 540 × 400 pixels. To feed the images into the models, the images were resized to 244 × 244 pixels. The ResNet-50 model was adjusted to take the black-and-white image (rather than a red, green, and blue color image) and output three classes: no signal, up-sloped signal, or down-sloped signal. The VGG16 model input was not adjusted to black-and-white images. Each model ran 1000 iterations with a stochastic gradient descent optimizer, which was selected to have a learning rate of 0.0001. For the latency measurements, in the case of an up-sloped signal, the coordination of the highest point marked the latency; in the case of a down-sloped signal, the coordination of the lowest point marked the latency. These were measured using simple statistical software across the training set (
n = 1413) and the testing set (
n = 613); the distribution of these result is presented
Figure 2.
In order to maintain high accuracy of longitudinal mfVEP analyses, and latency in particular, it is more important not to include noisy traces (which may introduce a high degree of variability) than to miss some of the true “signal” traces. Therefore, for the model to be successful, a low false-positive (FP) rate (i.e., noisy traces classified as real signal) is required. Conversely, a low false-negative (FN) rate (i.e., traces containing real signal but classified as noise), although still desirable, is far less crucial, as these rates are excluded from analyses and therefore do not affect progression results.
In this study, we present an approach to the automatic interpretation of mfVEP signals using AI that can provide a rapid and accurate separation of noisy responses from reliable signals, thus enabling change analysis in longitudinal follow-up. This is important, as the mfVEP is one of the few tools available to monitor the state of nerve myelination. Remyelination of chronically demyelinated white matter represents a promising strategy in the treatment of MS. Such an approach offers the potential to prevent accelerated axonal degeneration of damaged (demyelinated) axons from inflammatory mediators and immune effector cells and to restore conduction velocity.
12–14 The validation of remyelinating therapies, however, is hampered by the current lack of consensus on use of imaging biomarkers for remyelinating trials, particularly considering the moderate effect of potential remyelinating drugs.
15 Although there are a number of promising therapies, reliable imaging biomarkers for myelin repair remain to be identified. However, evoked potentials and, particularly, mfVEPs due to their ability to directly estimate the speed of axonal conduction are highly sensitive and very accurate quantitative measures of de-/remyelination in both experimental and clinical settings.
2,3,16–18 Although accuracy of VEP measurement and latency, in particular, is essential for monitoring optic nerve function (considering the small degrees of change observed in remyelination trials
3), the vast amount of mfVEP data typically collected in human clinical trials and manual techniques utilized for latency measurement make it susceptible to error.
AI has been increasingly used in the field of biomedical image analysis. In the current study, we have tested the capability of two image-based AI models to correctly identify the presence of measurable mfVEP traces and separate them from noisy (i.e., unreliable) traces in a group of treated MS patients, among whom we would expect to find both normal and reduced amplitude responses, as well as changes in latency. The patients were therefore representative of the typical clinical scenario where a range of responses may be encountered, even across the field of one recording.
In order to compare AI to human performance, we initially evaluated the ability of an experienced mfVEP analyst to separate identifiable traces with visible responses from noisy (no-signal) traces. The experienced mfVEP reader had an error rate of just below 10%. From an accuracy point of view, it is more important not to overestimate positive responses (not to identify noisy traces as true signals) than to categorize true responses as noise and lose some data. For this reason, when the AI classification was being performed we aimed for a smaller false positive rate (>5%) at the expense of increased FN responses (>10%).
In general, both models demonstrated a high level of precision in identifying measurable traces when the testing dataset was analyzed. False-positive and false-negative rates were well within the expected range (<5% and 10%, respectively). The strong performance of the AI models with regard to achieving results comparable to those of an experienced human analyst is encouraging, particularly considering the time saved when an AI algorithm performs the classification. For example, it takes 20 to 30 minutes to thoroughly analyze mfVEP data for an eye, but the time required for AI to perform the same task is 1 minute. The detection of latency using AI networks to identify the slope of the main signal also demonstrated excellent performance, with no misclassification in the testing dataset.
To the best of our knowledge, this is the first study to propose such an approach for analyzing mfVEPs and its use in clinical practice for MS. Qiao
19 described a deep learning technique based on the VGG19 model that demonstrated an accuracy of 90.6% when analyzing data from patients with suprasellar tumors. In our study, both models demonstrated high precision of over 97% for signal detection and a FP rate of less than 2%. Given that the traces were recorded using different operators at different sites, the result of classification is not site (or operator) specific; however, the same model of mfVEP machine (OV-1) was used to obtain all of the recordings. Hence, it remains to be seen how well the algorithm will perform when applied to data collected using different models of mfVEP machines.
The primary aim of the current study was to identify and remove noisy (unmeasurable) mfVEP traces from the analysis of the latency progression. Because MS patients are known to exhibit the entire range of mfVEP waveforms, data from normal controls were not required in this modeling. However, the current approach can now be applied to a population of normal subjects to determine the overall specificity of mfVEP.
Both models proved to be efficient algorithms in detecting mfVEPs and were able to efficiently replicate or outperform a human. The processing times for both models were similar. However, as demonstrated in
Table 2, VGG16 has higher acuity in detecting true signals and maintained a low FP rate. As discussed earlier, this is an important point when considering correct latency measurements and subsequent applications in clinical decision making; thus, the authors recommend that this model be used in clinical trials versus ResNet-50. As a future direction for this study, an AI model processing one-dimensional numerical data could be considered to complement the findings from our study, which used image-based, two-dimensional AI modeling of mfVEPs.
In conclusion, the application of AI to mfVEP analysis to separate true traces from those contaminated by noise and to identify latency peaks proved to be accurate, reliable, and efficient. It opens up new possibilities for using mfVEPs as biomarkers in clinical trials of remyelinating agents, as these will monitor latency changes over time.
9 This tool can be used in clinical practice and provide a fast and relatively low-cost assessment of the remyelinating capacity of new therapies in MS.
20
Supported by grants from the National Multiple Sclerosis Society (RG4716A6/3), the Sydney Eye Hospital Foundation, the Claffy Foundation, and Sydney Medical School Foundation (K6602/RY285).
Disclosure: S. Klistorner, None; M. Eghtedari, None; S.L. Graham, None; A. Klistorner, None