Abstract
Purpose:
Glaucoma is a leading cause of irreversible blindness worldwide, necessitating precise visual field (VF) assessments for effective diagnosis and management. The ability to accurately digitize VF reports is critical for maximizing the utility of the data gathered from clinical evaluations.
Methods:
In response to the challenges associated with data accessibility in digitizing VF reports, we developed a lightweight convolutional neural network (CNN) framework. Using a decade-long dataset comprising 15,000 reports, we preprocessed portable document format files and standardized the extracted textual data into 48 × 48 pixel images. To enhance the model's generalization capabilities, we incorporated a variety of font types into the dataset.
Results:
The proposed CNN model achieved 100% accuracy in extracting numerical values and over 98.6% accuracy in metadata recognition. Post-processing correction using keyword mapping further improved metadata reliability, effectively addressing errors caused by visually similar characters. The model demonstrated superior efficiency compared to manual data entry, significantly reducing processing time while maintaining near-perfect accuracy.
Conclusions:
The findings highlight the effectiveness of our AI-driven digitization method in accurately interpreting Humphrey VF images. This advanced framework provides a reliable solution to digitizing complex visual field reports, thereby facilitating enhanced clinical workflows.
Translational Relevance:
The implications of this study extend to streamlined clinical workflows and AI-based report interpretation. By enabling comprehensive trend analysis of visual field changes, our model represents a significant advancement in glaucoma care, showcasing the transformative potential of AI-driven technologies in enhancing precision medicine and improving patient outcomes.
Humphrey VF reports, available in 30-2, 24-2, and 10-2 SFA PDF formats, underwent meticulous preprocessing to enable accurate data extraction. Detailed analysis revealed that these reports contain 72 unique alphanumeric characters, including numbers, space, symbols, and letters (0123456789 %/+−.,:<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ).
A key observation was that, within PDFs of the same HFA test pattern, the placement of textual and numerical data remains fixed across different reports. This consistency allowed for the development of a coordinate-based extraction framework, where predefined bounding boxes were established based on different HFA formats (HFA2 and HFA3) and test patterns (30-2, 24-2, and 10-2) to efficiently locate and extract relevant text data. This method significantly enhances processing efficiency compared to object detection models, which require additional computational resources for dynamic text localization.
Figure 1 illustrates an example of HFA3 24-2 test pattern, where bounding boxes precisely enclose metadata and numerical values, forming a structured coordinate array for text extraction.
To evaluate the model's accuracy across different HFA formats, we assessed performance on HFA2 and HFA3 models, using the 30-2, 24-2, and 10-2 test patterns. The extracted data was categorized into metadata and numerical values, with separate accuracy calculations for each, as summarized in
Table 2.
Table 2. Accuracy of Text Recognition Across Different HFA Models and Test Patterns, Categorized Into Metadata and Numerical Data
Table 2. Accuracy of Text Recognition Across Different HFA Models and Test Patterns, Categorized Into Metadata and Numerical Data
Our results show that the model consistently achieved 100% accuracy in extracting critical numerical data, including raw threshold sensitivity, total deviation, and pattern deviation values, regardless of HFA model or test pattern. For metadata extraction, the model demonstrated exceptionally high accuracy, exceeding 98.6% across all test formats, with only minimal errors.
The proposed model is designed for single-character recognition. However, certain characters are inherently difficult to distinguish, even for human eyes, due to their visual similarity. Notable examples include “O” (Orange), “o” (owl), and “0” (zero), as well as “I” (Ice) and “l” (lamp). These similarities frequently lead to misclassification, particularly in metadata fields.
Although metadata recognition accuracy exceeded 98.6%, a small number of errors persisted, primarily because of the misclassification of these visually similar characters. To address this issue, we implemented a post-processing correction mechanism based on keyword mapping. A predefined dictionary was constructed by identifying commonly occurring metadata keywords and enumerating all potential misrecognized variations. This dictionary was then applied to systematically correct misclassified metadata entries, ensuring accurate keyword restoration.
Table 3 summarizes the implemented keyword correction strategy. This post-processing approach effectively resolves character misclassification caused by font-based similarities, thereby ensuring highly reliable metadata extraction.
Table 3. Correction of Commonly Misrecognized Words
Table 3. Correction of Commonly Misrecognized Words
To assess computational efficiency, we randomly selected 1,000 PDFs from the testing dataset and processed them using our trained model for inference, followed by post-processing correction. The experiment was conducted on a desktop computer (Intel i7 CPU, 16GB RAM) running TensorFlow 2.10, using only CPU processing without GPU acceleration. The total processing time was 148 minutes, averaging 8.89 seconds per PDF.
To verify accuracy, all extracted text was manually reviewed, confirming 100% correctness in text recognition. In contrast, manual annotation of PDFs, while theoretically capable of achieving 100% accuracy, is highly labor-intensive and error prone. Manual transcription introduces a risk of typographical errors and misinterpretation, requiring approximately 20 to 30 minutes per report. Extrapolating from this, transcribing 1000 PDFs manually would require an estimated two months of full-time work, underscoring the substantial efficiency gains achieved through automation.
This study presents a novel AI-driven approach for digitizing Humphrey SFA reports, providing a highly accurate, efficient, and scalable solution for clinical and research applications. Using a lightweight CNN model, the proposed method achieves 100% accuracy in numerical data extraction and 98.6% accuracy in metadata recognition. Automating this process enables structured data storage, retrieval, and integration, facilitating advancements in automated visual field analysis.
Existing optical character recognition (OCR) solutions fall into two categories: third-party OCR packages and object detection-based models. Third-party OCR tools, such as TesserOCR, are designed for general text recognition and require no custom training.
12 However, their lack of domain-specific adaptation results in suboptimal accuracy when applied to fixed-layout medical reports. Object detection-based models, although more flexible, require substantial computational resources, extensive fine-tuning, and high-performance hardware.
9 Their multistep segmentation process also introduces error propagation and increases processing time.
Our model overcomes these limitations by leveraging predefined text coordinates within fixed-layout Humphrey SFA reports, ensuring precise and efficient recognition. Unlike object detection methods, it does not require separate text localization, reducing computational complexity and improving accuracy. Compared to third-party OCR tools, it is specifically trained on HFA2 and HFA3 reports, incorporating domain-specific data to enhance performance. These optimizations result in superior accuracy and efficiency, making the model well suited for large scale clinical implementation.
Another advantage of our model is its robust generalization across different font styles. By incorporating 50 built-in Windows fonts alongside Humphrey PDF reports, the model was trained to recognize a broad spectrum of character variations, reducing dependency on specific font configurations. This data augmentation strategy enhances adaptability to font variations across different SFA report versions.
To further improve metadata extraction, we implemented a post-processing correction mechanism to address misclassifications of visually similar characters. A keyword mapping strategy systematically corrected errors, enhancing metadata reliability and ensuring adaptability to real-world clinical applications where precision is critical. Future directions may include integrating recurrent neural networks with CNNs and using Connectionist Temporal Classification
13 for text sequence prediction, transitioning the system into a fully end-to-end deep learning model without reliance on post-processing.
Our deep learning model demonstrates exceptional efficacy in digitizing large volumes of Humphrey PDF reports with near-perfect accuracy. The resulting digitized data offers compact storage solutions and facilitate seamless integration into diverse clinical and research workflows. Moreover, these digitized datasets serve as invaluable resources for advanced applications, including AI-based report interpretation.