March 2022
Volume 11, Issue 3
Open Access
Articles  |   March 2022
The Impact of Image Quality and Trachomatous Inflammation on Using Photography for Trachoma Prevalence Surveys
Author Affiliations & Notes
  • Michelle Odonkor
    Dana Center for Preventive Ophthalmology, Wilmer Eye Institute, Johns Hopkins University, Baltimore, MD, USA
  • Fahd Naufal
    Dana Center for Preventive Ophthalmology, Wilmer Eye Institute, Johns Hopkins University, Baltimore, MD, USA
  • Harran Mkocha
    Kongwa Trachoma Project, Kongwa, Tanzania
  • Nicodemus Funga
    Kongwa Trachoma Project, Kongwa, Tanzania
  • Beatriz Muñoz
    Dana Center for Preventive Ophthalmology, Wilmer Eye Institute, Johns Hopkins University, Baltimore, MD, USA
  • Sheila K. West
    Dana Center for Preventive Ophthalmology, Wilmer Eye Institute, Johns Hopkins University, Baltimore, MD, USA
  • Correspondence: Sheila West, Wilmer Room 155, Johns Hopkins Hospital, 1800 Orleans Street, Baltimore, MD 21287, USA. e-mail: [email protected] 
Translational Vision Science & Technology March 2022, Vol.11, 11. doi:https://doi.org/10.1167/tvst.11.3.11
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michelle Odonkor, Fahd Naufal, Harran Mkocha, Nicodemus Funga, Beatriz Muñoz, Sheila K. West; The Impact of Image Quality and Trachomatous Inflammation on Using Photography for Trachoma Prevalence Surveys. Trans. Vis. Sci. Tech. 2022;11(3):11. https://doi.org/10.1167/tvst.11.3.11.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Graded images can be used for trachoma prevalence surveys, but there is concern for mismatch between image and field grades of the upper tarsal conjunctiva. We aimed to determine if poor photograph quality and/or inflammation may contribute to differential grading of trachomatous inflammation—follicular (TF) between field and photograph graders.

Methods: We developed a simplified and expanded image quality grading tool. Agreement was assessed using kappa statistic. We included 5417 eyes with both field and image grades for TF. Eyes where the field and adjudicated photograph TF grades did not match were identified (mismatched) and assigned an image quality (IQ) score and a potential mismatch reason. We also assigned IQ scores to a stratified random sample of 60 eyes with matching field and photograph TF grades (matched).

Results: There were 5240 eyes that had matching grades, whereas 177 eyes (3.3%) were mismatched. Overall quality was high, even in mismatched eyes. There was no difference in overall or specific IQ metrics between eyes with matching grades and eyes with mismatched grades (P = 0.59). Mismatched eyes had worse inflammation compared to matched eyes (P = 0.048). The primary reason for calling TF in the field but not in the photographs appeared to be the number of follicles observed.

Conclusions: Image quality did not explain mismatch between field grades and image grades from this prevalence survey. Inflammation made mismatch more likely.

Translational Relevance: Our quality grading scheme rapidly identifies image quality issues for training. Standardizing TF grading in the presence of inflammation will improve field and photograph grading.

Introduction
Trachoma, a chronic conjunctivitis caused by repeated episodes of infection with Chlamydia trachomatis, is a major public health concern, particularly in developing countries.1 The global effort to eliminate trachoma has had considerable success with several countries having declared elimination following World Health Organization guidelines.1 One of the two criteria for elimination is a prevalence of trachomatous inflammation—follicular (TF) of less than 5% in children ages 1 to 9 years, as determined by 2 district-level population-based surveys conducted at least 2 years apart, in the absence of mass drug administration (MDA).2 
For these surveys, countries typically rely on trained field graders who have been standardized using live cases.3 As trachoma prevalence declines, training and re-certifying field graders becomes increasingly more difficult and expensive, in some cases, necessitating travel to endemic countries. This problem will become more acute as more countries undertake surveys seeking to validate the elimination of trachoma. One solution is to consider the use of photography and grading images for trachoma as a replacement for field grading. Clinical trials and research using large-scale prevalence surveys have long used this approach. Image capture and masked grading avoids bias that can result from knowledge of the TF status of the patient's other eye, family members, endemicity in the village, prior MDA status, and other characteristics that may potentially affect live grading. 
However, there are a number of issues raised by the use of photography.4 One is concern for the number of ungradable images that can occur, which in some surveys was reported as quite large.5 As in other ophthalmic uses of photography, this problem can be mitigated by training, certification, and supervision of photographers. Second, there is no standardized approach to grading the quality of images, a key metric that would be beneficial for use in training photographers and providing them feedback on their performance. Third, there is concern for disagreements between field graders and image graders over the presence or absence of TF in the same upper tarsal conjunctiva. Studies of the agreement often use field grading as the “gold standard” despite evidence that field graders are also prone to over- or under-calling TF.6 
The purposes of this study were twofold: (i) to develop a simple, standardized scheme for grading the quality of images of the upper tarsal conjunctiva, and (ii) to examine the possible reasons for mismatches between field grades and image grades using an expanded grading scheme for the assessment of inflammation and an expanded quality grading scheme that outlined detailed criteria for assessing image quality. We hypothesized that differential grading of TF between field and image graders was due to two factors: poor quality of images and the presence of inflammation. 
Methods
Design of Image Quality Grading Scheme
Designing the quality grading metrics was an iterative process that consisted of identifying the various independent factors affecting quality and determining the size of their impact on the quality of the image. The specific factors included focus, lighting and shadows, glare and tear film, degree of eyelid eversion, eyelid blanching, and debris and obstruction in the field of interest (Table 1). These factors affecting image quality could all be addressed by a photographer with minimal effort, and quality issues could generally be resolved by retaking the photograph. For instance, we included factors, such as glare or tear film, which could be wiped away, but excluded eyelid pathologies like tumors, nevi, discolorations, injuries, surgical scars, inflammation, and scarring, as they could not be cleared or resolved by a photographer and would be encountered by and would be similarly challenging for field graders. 
Table 1.
 
Definition of Factors Affecting Image Quality
Table 1.
 
Definition of Factors Affecting Image Quality
The grading metrics took all modifiable quality factors impacting the grading area (area of the tarsal conjunctiva, as described by Solomon et al.) into account.3 Focus generally affects the entire photograph and grading area and therefore took priority in the metrics. Focus issues were categorized as none/mild, moderate, or severe. Moderate focus issues were determined by comparison to a reference image (Fig. 1A). A photograph with substantially worse focus was then graded as having a severe focus issue, whereas a photograph with better focus was graded as having no or a mild focus issue. Following categorization of image focus, the combined grading area affected by the remaining quality factors was assessed. Adequate eversion of the eyelid assumed maximum possible visibility of the conjunctival grading area. Complete eversion was achieved if the vertically oriented upper deep tarsal vessels covered approximately the top third of the eyelid and the lower vessels covered approximately the bottom two-thirds of the eyelid. 
Figure 1.
 
Images illustrating quality issues that were used as reference standards. (A) Sample tarsal conjunctiva with a moderate focus issue (grade = 2) and insignificant glare spots. (B) Sample tarsal conjunctiva with a moderate eversion issue (grade = 2). (C) Sample tarsal conjunctiva with a moderate blanching issue (grade = 2).
Figure 1.
 
Images illustrating quality issues that were used as reference standards. (A) Sample tarsal conjunctiva with a moderate focus issue (grade = 2) and insignificant glare spots. (B) Sample tarsal conjunctiva with a moderate eversion issue (grade = 2). (C) Sample tarsal conjunctiva with a moderate blanching issue (grade = 2).
We developed a simple image quality grading scheme that separated good-quality images from moderate- or poor-quality images based on focus and how much of the grading area was impacted by the other quality factors (Table 2). 
Table 2.
 
Simplified Image Quality Metrics
Table 2.
 
Simplified Image Quality Metrics
We also developed an expanded grading scheme that allowed us to determine the impact of each of the quality metrics on the problem of disagreement between field grades and image grades (Table 3). For each quality issue, we have three different levels of concern — significant, moderate, and no/mild issue. 
Table 3.
 
Expanded Image Quality Grading Scheme
Table 3.
 
Expanded Image Quality Grading Scheme
Expansion of World Health Organization Grading of Trachomatous Inflammation—Severe
For assessment of the impact of inflammation on disagreements between image and field graders, we expanded the World Health Organization (WHO) simplified grading scheme3 to allow for no to mild inflammation to be defined as <33% of vessels obscured by inflammation, moderate inflammation as 33 to <50% of vessels obscured by inflammation, and severe inflammation as 50% or more vessels obscured. Our criterion for severe inflammation is the WHO definition of trachomatous inflammation—severe (TI; Table 4). 
Table 4.
 
Criteria for Inflammation Grade
Table 4.
 
Criteria for Inflammation Grade
Image Quality and Inflammation Agreement Analysis
To test inter-grader agreement, a set of 75 images, including a range of levels of quality, was created by a grader who was not part of the agreement analysis. Two graders, who had been trained to grade quality on a different set of images, used our simplified quality grading scheme to independently assign quality grades to this test set. Agreement in assessment of image quality was measured using a weighted Cohen's kappa statistic where weights of 0.5 were assigned to adjacent cells. 
For determining agreement in assessment of TI, a set of 74 images that included a mix of inflammation levels was created from a library of images maintained at the Dana Center for Preventive Ophthalmology from prior population-based surveys. None of the images from the training set used for training image graders was part of the test set. Two trained graders independently assigned inflammation grades to the test set, and agreement was measured using Cohen's kappa statistic. 
Survey Used to Obtain Matched and Mismatched Grades From Image and Field Graders
The images graded in this analysis were obtained from a survey of 3118 children ages 1 to 9 years old from a formerly trachoma hyperendemic district (Kongwa, Tanzania), 2 years after the cessation of MDA. For 404 children (808 eyes), a camera was not available to use in the field, and the rest had images taken of the upper tarsal plate using a handheld Nikon D40 digital SLR camera in manual setting, with a 105 mm f/2.8D Auto Focus Micro Nikkor lens (fully extended). Of the 2714 children (5428 eyes), 11 images were ungradable according to our simplified grading criteria. Thus, our analysis is based on the 5417 eyes with gradable images. 
A field grader, trained and standardized in the use of the WHO simplified grading scheme,3 assigned field grades for the presence or absence of TF and was also the photographer who obtained tarsal conjunctival photographs of both eyes in each child. The photographer was originally trained in early 2000 for clinical trials that had masked assessment of trachoma as an end point. Training involved on site evaluation of use of the SLR digital camera, and the acquisition of gradable images from both eyes in at least five children. Each year, there is refresher review of image capture practice. Photographs of the tarsal conjunctiva are taken with manual focus under natural light conditions while shielded from direct sunlight. The entirety of the conjunctiva must be in frame while the thumb used for eyelid eversion must not obstruct the grading area. Care is taken to wipe away tears or excess moisture prior to image capture and no flash or artificial source of light is used to prevent glare. The camera lens is angled to be directly perpendicular to the conjunctiva to reduce shadows and glare. All images taken during the survey were sent to the Dana Center where two standardized trachoma image graders independently graded each image. Any disagreements were openly adjudicated with a senior grader. 
We identified the eyes where the field and adjudicated photograph TF grades matched and took a stratified random sample of 60 eyes, with strata based on presence or absence of TF (“match” eyes). We also identified all eyes where the field and photograph grades did not match (“mismatch” eyes, n = 177). All eyes in both samples were assigned a quality score for each metric of quality, using the expanded system. In addition, the image graders assigned a grade of inflammation to the image. Finally, all mismatched eyes were re-reviewed by the two image graders who, without knowledge of the direction of the mismatch, assigned a potential reason for the mismatch just based on review of the photographs. Potential reasons for mismatch included different interpretation of follicle number (<5 vs. 5 or more), different assessment of follicle size (<0.5 mm or 5 mm or greater), or no obvious reason for the mismatch. 
The survey was approved by the Institutional Review Board of Johns Hopkins Medicine and the National Institute for Medical Research in Tanzania, and conducted under the tenets of the Declaration of Helsinki. 
Results
Agreement was good between two independent graders when using the simplified quality grading scheme, with a weighted kappa of 0.67 (95% confidence interval [CI] = 0.50–0.84). The two graders agreed on quality for 82.7% of images in the mixed sample of 75 eyes (Table 5). 
Table 5.
 
Grader Agreement using the Simplified Grading Scheme for Quality of Trachoma Images
Table 5.
 
Grader Agreement using the Simplified Grading Scheme for Quality of Trachoma Images
The agreement between two graders using the grading scheme for inflammation was also good, with a weighted kappa of 0.72 (95% CI = 0.59–0.85). The two graders agreed on inflammation in 81% of the 74 images (Table 6). 
Table 6.
 
Grader Agreement when determining the Degree of Inflammation in Trachoma Images
Table 6.
 
Grader Agreement when determining the Degree of Inflammation in Trachoma Images
Of the 5428 images available in the survey, only 11 (or 0.2%) were deemed ungradable. Of the remaining 5417 eyes, field and photograph graders disagreed over the presence or absence of TF in 177 (3.2%) eyes (i.e. mismatched eyes). When there was a mismatch, the field graders assigned a field grade of TF (n = 130 eyes, 73.4% of the mismatched eyes) more frequently than the photograph graders (n = 47 eyes, 26.6% of the mismatched eyes; Fig. 2). 
Figure 2.
 
Total number of eyes with both field and photograph grades.
Figure 2.
 
Total number of eyes with both field and photograph grades.
We used the expanded grading scheme to determine if there were specific quality metrics that contributed to mismatch between field versus photo grades in the 177 mismatch eyes, compared to the sample of eyes where there was no mismatch (Table 7). The most common image quality issue was blanching in the image (7.3% of mismatch images and 5.0% of matching-grade images). The frequency of image quality issues worse than grade one was very low in both the mismatch and matched eyes, and although the frequencies were slightly higher in the mismatch eyes, there was no significant difference between the two groups. Over 98% of the random sample of matching-grade eyes were evaluated as high-quality images, and 95.5% of the mismatch eyes were evaluated as high-quality images. 
Table 7.
 
Detailed Analysis of Image Quality in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 7.
 
Detailed Analysis of Image Quality in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
However, there was a significant difference in the degree of inflammation between mismatch eyes and matched eyes, with mismatch eyes being more likely to have higher-grade inflammation than matched eyes (Table 8). Within the 177 mismatch eyes, inflammation influenced both photograph graders and the field grader, as the degree of grade 3 inflammation (TI) was no different in the 47 eyes called TF only by the photograph graders (7/47 or 14.9%) compared to the 130 eyes called TF only by the field grader (25/130 or 19.2%, P = 0.51). 
Table 8.
 
Severity of Inflammation in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 8.
 
Severity of Inflammation in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
For mismatch eyes where the field grader assigned a grade of TF and the photograph graders did not (n = 130), a review of the images suggested the main reason for disagreement was due to differences in whether five follicles were present (Fig. 3). In 61.5% of the 130 eyes, the field grader observed at least 5 follicles, but the photograph graders did not find 5 follicles in the image (Fig. 4a). The greater frequency of the perceived follicle number as the reason for mismatch in eyes where the field grader called TF versus eyes where the photograph graders called TF — 61.5% versus 27.7% — was statistically significant (P = <0.0001). 
Figure 3.
 
Photograph grader assessment of the possible reasons for mismatch between photo and field grades for TF, according to review of images where the field grade was TF (N = 130) or the photograph grade was TF (N = 47).
Figure 3.
 
Photograph grader assessment of the possible reasons for mismatch between photo and field grades for TF, according to review of images where the field grade was TF (N = 130) or the photograph grade was TF (N = 47).
Figures 4.
 
Examples of images where there was a mismatch in TF grading due to differences in interpretation of follicle number and follicle size. (A) Mismatch due to follicle number, with the field grader assigning a grade of TF and the photograph graders assigning a grade of no TF. (B) Mismatch due to follicle size, with the photograph graders assigning a grade of TF and the field grader assigning a grade of no TF.
Figures 4.
 
Examples of images where there was a mismatch in TF grading due to differences in interpretation of follicle number and follicle size. (A) Mismatch due to follicle number, with the field grader assigning a grade of TF and the photograph graders assigning a grade of no TF. (B) Mismatch due to follicle size, with the photograph graders assigning a grade of TF and the field grader assigning a grade of no TF.
However, for mismatch eyes where the photograph graders assigned a grade of TF and the field grader did not, the main reason for disagreement was due to differences in interpretation of follicle size (66.0%), where the photograph graders felt that the follicles present were of sufficient size and the field grader did not (Fig. 4b). The lesser frequency of perceived follicle size as the reason for the mismatch in eyes where the field grader called TF versus eyes where the photograph graders called TF — 31.5% versus 66.0% — was statistically significant (P = <0.0001). In about 4% of images, regardless of direction of the mismatch, the photograph graders could not discern any likely reason for the mismatch between field and photograph grades for TF. 
The presence of inflammation also appeared to affect whether or not the size or the number of follicles was the likely reasons for the mismatch in field versus photograph grade. As the score of inflammation increased, the likelihood that the size of the follicle was the problem increased, from 39% in grade 1 inflammation, to 50% in grade 3 (Table 9). As inflammation increased, it was less likely that the number of follicles was the issue in the mismatch. None of the trends were statistically significant. 
Table 9.
 
Percentage of Mismatched Eyes with either Number of Follicles or Size of Follicles as the likely Reasons for the Mismatch, by Degree of Inflammation
Table 9.
 
Percentage of Mismatched Eyes with either Number of Follicles or Size of Follicles as the likely Reasons for the Mismatch, by Degree of Inflammation
Discussion
Previous studies using photography to grade eyelids for trachoma have shown reasonably good correlation with field grading.4,79 However, some studies have had difficulties with high rates of ungradable images from 11% to 78%.5,9,10 Commonly reported factors contributing to images being ungradable included improper focus on the grading area, inadequate coverage of the grading area, excess light reflection, or shadows obscuring the grading area. Quality assessment of photographs at an early stage in photographer training and surveying may help prevent data loss and wasted effort in the field, but there is no standardized assessment for quality of images of the tarsal conjunctiva. In this study, we defined a series of quality metrics to use when grading the quality of images of the everted upper eyelid. This detailed assessment was quite time-consuming to implement to determine the degree of quality for each metric, requiring almost 5 minutes per image. A simplified overall quality grading scheme was then developed, which was much easier to use and had reasonable inter-grader agreement. Such a scheme could be rapidly deployed to measure the quality of images and provide feedback to photographers. The more detailed scheme might be used where feedback on the reasons for poor quality is needed. If the grade of images is the primary end point for a survey or a research study, then quality assessment can be built in at the outset by the review of images by the photograph graders. The simple method for quality assessment could also be taught to photography supervisors who could review a sample of images each day during a survey to provide feedback. The timing of review is critical for the training utility of assessing quality, as it is not helpful if performed near the end of the survey and the quality is found lacking. 
We applied the detailed metric of quality assessment in the context of a previous survey where we had both field and image grades for TF to determine if there was a particular feature of the image that might explain the mismatch in grades between the field and image grade for TF. The rate of ungradable images was very low, 0.2% of eyes. Overall, the rate of mismatch eyes was very low as well, 177 (3.3%) of 5417 images. There was generally good agreement on the absence of TF by both field and image graders. However, in the possible presence of TF, we found that the rate of assigning a grade of TF was not equal between the field grader and the image graders in the same eyes. As Figure 2 shows, in the 333 total eyes where the field grader found TF, 39% (130) of those images were not called TF by photograph graders. In contrast, of the 250 total eyes where the photograph graders found TF, only 19% (47) were not also called TF in the field. 
We sought to determine the role of image quality as a reason for the mismatch between field and image grades. Because the possible presence of TF may be a confounder in assessing quality, we stratified the random sample of comparator eyes (i.e. eyes where the field and photograph grades for TF matched) by the presence of TF to be certain that close to half of the sample had TF. In fact, the overall image quality was high in both the samples of eyes where the field grades agreed with image grades and the full sample of mismatched eyes. The most common problem overall was blanching, caused by prolonged eversion of the eyelid, which makes ascertainment of follicles difficult. Blanching of greater than 10% of the grading area of the upper eye lid occurred in 7% of the mismatched eyes compared to 5% of the matched eyes, a difference that was not significant. We note that with the low rate of image quality issues, coupled with the small sample of mismatched eyes, we had limited power to detect significant differences in quality. We argue that even had the sample size been larger, it is not clear from our data that quality issues explain much of the difference found between field and image grades. 
However, the presence of inflammation, as graded on images, does appear to explain some of the differences between the matched and mismatched eyes. The matched eyes were more likely to have no or mild inflammation on image review, whereas the mismatched eyes were more likely to have higher-grade inflammation. Inflammation can be severe enough to obscure follicles, which would impact the assessment of the number of follicles, as well as cause encroachment of tissue around the follicle, leading to apparent diminution in size. An analysis of the impact of inflammation on the reasons for the mismatch suggested that at least an effect on the size of the follicles might be an issue. An example of such a problem may be seen in Figure 5, which was called TF by the field grader but not TF by the photograph graders. 
Figure 5.
 
Image with inflammation where the field grade was TF but the photograph grade was not TF.
Figure 5.
 
Image with inflammation where the field grade was TF but the photograph grade was not TF.
Currently, the sign of TI is not included in the assessment of active trachoma, which relies solely on the sign of TF. It is not clear if graders compensate for the presence of TI by lowering the threshold for the size of follicles or by presuming that if three or four follicles are visible that more may be hidden under the inflamed tissue. The impact of inflammation on field grading and image grading needs further discussion because, although inflammation may be a relatively rare sign in general, it was present in close to half of the eyes where mismatch of TF grades occurred. The argument that TI cannot be graded reliably, at least for images, was not the case in this study, and agreement on grading TI in the field has also reportedly been very good.4 
If the field grader had been more likely to compensate for the presence of inflammation than the image graders, we would have expected much higher rates of inflammation in the 130 eyes where only the field grader called TF than in the eyes where only the image graders called TF. However, the rates of grade three inflammation were not different between these two groups. The rate of inflammation in the 130 eyes called TF by only the field grader was 19.2%, compared to 14.9% in the eyes called TF by only the image graders. Although there was some indication of a higher rate in mismatch eyes with field grades of TF, the difference was not large and not statistically significant. 
Without knowledge of the direction of mismatch (i.e. whether the field or photograph graders called TF a mismatch eye), the photograph graders were asked to re-review the 177 mismatched images again and speculate on the possible reasons for mismatch. In particular, was it likely that the size of the follicles was an issue, or the number of follicles, or both reasons? If they could not discern an obvious reason, this was also noted. There are obvious limitations to this approach, primarily because the determination of reason for mismatch was made solely by the photograph graders without the thought process of the field grader and a re-examination of the eye in the field was not possible, which may have allowed for reconciliation of the image and field grades. The determination was based solely on the image, which, for example, could have had small areas of glare that obscured a follicle. Similarly, variations in follicle shape may have contributed to some ambiguity. Mismatched field and TF grades may have also represented recording errors in the field. Our data suggested that the problem of determining the number of follicles present was the primary reason the field grader called TF when the image graders did not. It is tempting to assume that the field grade should be the “gold standard,” as it represented review of the actual everted lid that could be assessed from multiple angles, whereas the photograph graders had only the image to assess. However, there is a risk of overcalling TF in situations where the rate of TF is low, as was the case in this survey, so we cannot entirely rule out field grader error.11,12 The data suggest that a difference in determination of the size of the follicles was likely a more common reason for the photograph graders to call TF when the field grader did not. For instance, the lid flipper in the survey did not have a thumb marker to assist in determining size, whereas photograph graders can standardize size and account for magnification with a ruler. Overall, the exercise of determining potential reason for mismatch was useful for at least two reasons. First, it points to the difficulty of categorizing borderline cases in a survey, which must be included in any live training of field graders and for training of image graders.13 Second, the findings again highlight that when comparing field and photograph grades of the same eye, we should not assume the field grade is the gold standard. In our data, we had photographic evidence of eyes with five follicles of the correct size that were not called TF by the field grader. 
There are some limitations to this study, in addition to those noted above. The quality of the images in the survey was overall very good, and very few were ungradable. We have historically used a well-trained photographer using an SLR camera for our surveys and recognize that comparisons using other camera systems in other surveys may not yield the same result. Thus, quality of images may be a more important factor in producing grading mismatches than we were able to discern from our dataset. However, by monitoring the quality of images using the simple quality assessment scheme, where we demonstrated reliable agreement, image quality should be enhanced in general. The higher rate of mismatches where the field graded TF may also be a function of the overall low rate of TF prevalence in the survey, where the cases may not be as severe. Where TF prevalence is high and the cases are more florid, there may be less mismatch. Ideally, we would have had multiple field graders for each eye for this study to help clarify the field grade, as we did for the image grade. 
In summary, we developed a useful tool to provide a rapid and reliable assessment of the quality of images of the upper tarsal conjunctiva that can be used to monitor photographers in the field. If the image quality is substandard, a more detailed assessment to determine the precise issue and institute re-training can be undertaken. Whereas we initially thought that the quality of the image, or metrics of quality, might be lower in eyes that had mismatched field and image TF grades, that was in fact not the case. A more significant problem was the presence of inflammation in the eye, a physical sign which affects both field and photograph grading. Training of both image and field graders needs to be standardized to either ignore inflammation or provide some accommodation in terms of follicle presence and size. Ideally, reconsideration of including the sign of TI in the determination of active trachoma might be valuable, as it does provide some information on ocular disease in the presence of TF14 and can be graded reliably. 
Acknowledgments
Funding Sources: Funds for the survey were provided by a grant from the Task Force for Global Health. Additional funds were provided by the El Maghraby chair at the Wilmer Institute. 
Disclosure: M. Odonkor, None; F. Naufal, None; H. Mkocha, None; N. Funga, None; B. Muñoz, None; S.K. West, None 
References
World Health Organization. World Report on Vision. Geneva, Switzerland: World Health Organization; 2019.
World Health Organization. Design parameters for population-based trachoma prevalence surveys. Geneva, Switzerland: World Health Organization; 2018.
Solomon AW, Kello AB, Bangert M, et al. The simplified trachoma grading system, amended. Bulletin of the World Health Organization. 2020; 98(10): 698. [CrossRef] [PubMed]
Naufal F, West SK, Brady CJ. Utility of Photography for Trachoma Surveys: A Systematic Review [published online ahead of print August 20, 2021]. Surv Ophthalmol, https://doi.org/10.1016/jsurvophthal.2021.08.005.
Butcher RM, Sokana O, Jack K, et al. Low prevalence of conjunctival infection with Chlamydia trachomatis in a treatment-naïve trachoma-endemic region of the Solomon Islands. PLoS Neglected Tropical Dis. 2016; 10(9): e0004863. [CrossRef]
Odonkor M, Naufal F, Munoz B, et al. Serology, infection, and clinical trachoma as tools in prevalence surveys for re-emergence of trachoma in a formerly hyperendemic district. PLoS Neglected Tropical Dis. 2021; 15(4): e0009343. [CrossRef]
West SK, Taylor HR. Reliability of photographs for grading trachoma in field studies. Br J Ophthalmol. 1990; 74(1): 12–13. [CrossRef] [PubMed]
Sheehan JP, Gebresillasie S, Shiferaw A, et al. School-based versus community-based sampling for trachoma surveillance. Am J Tropical Med Hygiene. 2018; 99(1): 150. [CrossRef]
Emerson PM, Lindsay SW, Alexander N, et al. Role of flies and provision of latrines in trachoma control: cluster-randomised controlled trial. Lancet. 2004; 363(9415): 1093–1098. [CrossRef] [PubMed]
Solomon AW, Bowman RJ, Yorston D, et al. Operational evaluation of the use of photographs for grading active trachoma. Am J Tropical Med Hygiene. 2006; 74(3): 505. [CrossRef]
Harding-Esch EM, Sillah A, Edwards T, Burr SE, Hart JD, Joof H, Partnership for Rapid Elimination of Trachoma (PRET) study group. Mass treatment with azithromycin for trachoma: when is one round enough? Results from the PRET Trial in the Gambia. PLoS Neglected Tropical Dis. 2013; 7(6): e2115. [CrossRef]
Martin DL, Bid R, Sandi F, et al. Serology for trachoma surveillance after cessation of mass drug administration. PLoS Neglected Tropical Dis. 2015; 9(2): e0003555. [CrossRef]
Gaynor BD, Amza A, Gebresailassie S, et al. Importance of including borderline cases in trachoma grader certification. Am J Tropical Med Hygiene. 2014; 91(3): 577. [CrossRef]
Zambrano AI, Munoz BE, Mkocha H, et al. Measuring trachomatous inflammation-intense (TI) when prevalence is low provides data on infection with Chlamydia trachomatis. Invest Ophthalmol Vis Sci. 2017; 58(2): 997–1000. [CrossRef] [PubMed]
Figure 1.
 
Images illustrating quality issues that were used as reference standards. (A) Sample tarsal conjunctiva with a moderate focus issue (grade = 2) and insignificant glare spots. (B) Sample tarsal conjunctiva with a moderate eversion issue (grade = 2). (C) Sample tarsal conjunctiva with a moderate blanching issue (grade = 2).
Figure 1.
 
Images illustrating quality issues that were used as reference standards. (A) Sample tarsal conjunctiva with a moderate focus issue (grade = 2) and insignificant glare spots. (B) Sample tarsal conjunctiva with a moderate eversion issue (grade = 2). (C) Sample tarsal conjunctiva with a moderate blanching issue (grade = 2).
Figure 2.
 
Total number of eyes with both field and photograph grades.
Figure 2.
 
Total number of eyes with both field and photograph grades.
Figure 3.
 
Photograph grader assessment of the possible reasons for mismatch between photo and field grades for TF, according to review of images where the field grade was TF (N = 130) or the photograph grade was TF (N = 47).
Figure 3.
 
Photograph grader assessment of the possible reasons for mismatch between photo and field grades for TF, according to review of images where the field grade was TF (N = 130) or the photograph grade was TF (N = 47).
Figures 4.
 
Examples of images where there was a mismatch in TF grading due to differences in interpretation of follicle number and follicle size. (A) Mismatch due to follicle number, with the field grader assigning a grade of TF and the photograph graders assigning a grade of no TF. (B) Mismatch due to follicle size, with the photograph graders assigning a grade of TF and the field grader assigning a grade of no TF.
Figures 4.
 
Examples of images where there was a mismatch in TF grading due to differences in interpretation of follicle number and follicle size. (A) Mismatch due to follicle number, with the field grader assigning a grade of TF and the photograph graders assigning a grade of no TF. (B) Mismatch due to follicle size, with the photograph graders assigning a grade of TF and the field grader assigning a grade of no TF.
Figure 5.
 
Image with inflammation where the field grade was TF but the photograph grade was not TF.
Figure 5.
 
Image with inflammation where the field grade was TF but the photograph grade was not TF.
Table 1.
 
Definition of Factors Affecting Image Quality
Table 1.
 
Definition of Factors Affecting Image Quality
Table 2.
 
Simplified Image Quality Metrics
Table 2.
 
Simplified Image Quality Metrics
Table 3.
 
Expanded Image Quality Grading Scheme
Table 3.
 
Expanded Image Quality Grading Scheme
Table 4.
 
Criteria for Inflammation Grade
Table 4.
 
Criteria for Inflammation Grade
Table 5.
 
Grader Agreement using the Simplified Grading Scheme for Quality of Trachoma Images
Table 5.
 
Grader Agreement using the Simplified Grading Scheme for Quality of Trachoma Images
Table 6.
 
Grader Agreement when determining the Degree of Inflammation in Trachoma Images
Table 6.
 
Grader Agreement when determining the Degree of Inflammation in Trachoma Images
Table 7.
 
Detailed Analysis of Image Quality in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 7.
 
Detailed Analysis of Image Quality in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 8.
 
Severity of Inflammation in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 8.
 
Severity of Inflammation in Eyes where Field and Photograph Graders had Matched and Mismatched Grades for TF
Table 9.
 
Percentage of Mismatched Eyes with either Number of Follicles or Size of Follicles as the likely Reasons for the Mismatch, by Degree of Inflammation
Table 9.
 
Percentage of Mismatched Eyes with either Number of Follicles or Size of Follicles as the likely Reasons for the Mismatch, by Degree of Inflammation
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×