The approach described in this article provides a blueprint to tackle challenging big data analysis problems related to collisions in daily mobility of visually impaired and blind participants. The main contributions of our approach are (i) applying robust methods for quantification of mobility related outcomes from video data recordings in the daily mobility of people with severe visual impairments, and (ii) proposing a novel algorithm for data reduction to make the analysis effort feasible.
Our approach focuses on the previously unaddressed issue of analyzing large amounts of video data to obtain mobility-related outcome measures relevant to the use of devices to assist in obstacle detection and collision avoidance when walking. Previous studies about naturalistic walking mobility in visually impaired individuals mainly analyzed motion sensor data (number of steps and/or falls) and primarily focused on a particular group of patients or disease category (such as glaucoma
14,17,18,25 or AMD
15,26), where the collision risk was presumably lower compared with people with more severe visual impairments or blindness who were the focus of our study. Although the proposed methods were designed and tested for data involving blind or severely visually impaired individuals, the same methods could be used when investigating real-world mobility in other patient populations.
The inter-rater reliability varied between different review items, with classification of valid events being the highest, followed by true hazard, all contacts, and body contacts. In other words, it was easier to tell whether an event was valid or not than to tell whether there was a body contact. Given the wide variability between the scenarios where the events took place, it is conceivable that no matter how closely aligned the two raters are, there will be disagreements when classifying for body contacts. Therefore, multiple independent reviews followed by consensus based reconciliation can ensure that the most important outcome measure is obtained with relatively high reliability despite disagreements.
The data reduction technique was designed with the same goal of obtaining important mobility-related outcomes with high reliability. The disagreement prediction algorithm was tuned to ensure most potential disagreements were not missed, possibly at the cost of an increased false alarm rate (predicting disagreement for an event when there was no disagreement). Failing to quantify a body collision has negative consequences for data analyses. False alarms increase the amount of data that need to be reviewed but, as our study showed, the algorithm predictions covered about 82% of the disagreements in the body contact rating and greatly decreased the number of events that needed to be reviewed by both reviewers (by 81%).
The two raters exhibited differing categorization patterns when reviewing the data. These two individual reviewing patterns were used to train the disagreement prediction algorithm. Based on the review of events by rater B, it was relatively easy to determine which events rater A would disagree with in terms of body contact. However, the opposite was not necessarily true for the data reported here.
Once trained on a common set of data reviewed fully by two individuals, the algorithm should work as long as the same two individuals continue to do all the reviewing. However, if a new pair of reviewers is to be inserted, then they both will have to review a common set of events in sufficient numbers for the machine learning algorithm to learn their reviewing patterns. In our case, when training the algorithm, we worked with a sample of 2712 common events that were reviewed by both reviewers. Considering each event takes on average 1 minute to review (but new reviewers might take longer than trained reviewers), the lead time to retrain the disagreement prediction algorithm could be about 45 hours of reviewing per reviewer (90 hours for a new pair of reviewers). After the algorithm has been trained, depending on the algorithm performance, we can expect significant savings in the reviewing efforts compared to full double reviewing of all events by both reviewers. To put these savings in context, consider the data set from the clinical trial which currently consists of more than 29,000 events (at least 483 hours of reviewing for each reviewer). Initial, full double reviewing needs to be done only for about 10% of the total events for training the algorithm. For the remaining 90% of the data, the reviewing effort reduction will be substantial, on average 80%, resulting in approximately 12 fewer hours per thousand events reviewed. The reviewing effort reduction will likely vary between pairs of reviewers and could be more or less than found for the two reviewers in this study. Nevertheless, we suggest that a data reduction of 80% is a realistic expectation given that our two reviewers exhibited clearly different categorization patterns when reviewing.
Possible alternatives to the presented approach of video review might include crowdsourcing and artificial intelligence approaches. Crowdsourcing can be an efficient way to save researchers’ effort, particularly for relatively simple tasks such as image labeling, but may not be feasible for complex tasks such as detailed mobility video annotation that require nontrivial user training. Given the complexities of obstacle avoidance when walking in the real world, the reviewers for our particular application need to be aware of the functionality and limitations of the device. Also, there is little control over who reviews what in crowdsourcing, and therefore reconciliation of disagreements is not as straightforward as in our approach (joint review of items with disagreements). Another alternative approach, based on artificial intelligence algorithms to automatically review and annotate events, holds promise for future work.
In conclusion, our novel approach resulted in a data reduction of about 80%, which means that the actual amount of video to be reviewed will only be 19% of the original data. For the first time, our approach makes it possible to objectively study and quantify collision incidents in daily mobility of visually impaired and blind individuals, and makes it feasible to conduct clinical trials to objectively evaluate the effectiveness of video camera-based mobility assistance devices in habitual mobility. Furthermore, the approach described in this article may be helpful in providing a better understanding of the processes involved in and difficulties encountered during obstacle detection and avoidance when walking.