October 2022
Volume 11, Issue 10
Open Access
Artificial Intelligence  |   October 2022
A Framework for Automating Psychiatric Distress Screening in Ophthalmology Clinics Using an EHR-Derived AI Algorithm
Author Affiliations & Notes
  • Samuel I. Berchuck
    Department of Statistical Science, Duke University, Durham, NC, USA
  • Alessandro A. Jammal
    Duke Eye Center and Department of Ophthalmology, Duke University, Durham, NC, USA
  • David Page
    Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
  • Tamara J. Somers
    Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
  • Felipe A. Medeiros
    Duke Eye Center and Department of Ophthalmology, Duke University, Durham, NC, USA
    Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
  • Correspondence: Felipe A. Medeiros, Duke Eye Center and Department of Ophthalmology, Duke University, 2310 Erwin Road, Durham, NC 27710, USA. e-mail: felipe.medeiros@duke.edu 
Translational Vision Science & Technology October 2022, Vol.11, 6. doi:https://doi.org/10.1167/tvst.11.10.6
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Samuel I. Berchuck, Alessandro A. Jammal, David Page, Tamara J. Somers, Felipe A. Medeiros; A Framework for Automating Psychiatric Distress Screening in Ophthalmology Clinics Using an EHR-Derived AI Algorithm. Trans. Vis. Sci. Tech. 2022;11(10):6. https://doi.org/10.1167/tvst.11.10.6.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: In patients with ophthalmic disorders, psychosocial risk factors play an important role in morbidity and mortality. Proper and early psychiatric screening can result in prompt intervention and mitigate its impact. Because screening is resource intensive, we developed a framework for automating screening using an electronic health record (EHR)-derived artificial intelligence (AI) algorithm.

Methods: Subjects came from the Duke Ophthalmic Registry, a retrospective EHR database for the Duke Eye Center. Inclusion criteria included at least two encounters and a minimum of 1 year of follow-up. Presence of distress was defined at the encounter level using a computable phenotype. Risk factors included available EHR history. At each encounter, risk factors were used to discriminate psychiatric status. Model performance was evaluated using area under the receiver operating characteristic (ROC) curve and area under the precision–recall curve (PR AUC). Variable importance was presented using odds ratios (ORs).

Results: Our cohort included 358,135 encounters from 40,326 patients with an average of nine encounters per patient over 4 years. The ROC and PR AUC were 0.91 and 0.55, respectively. Of the top 25 predictors, the majority were related to existing distress, but some indicated stressful conditions, including chemotherapy (OR = 1.36), esophageal disorders (OR = 1.31), central pain syndrome (OR = 1.25), and headaches (OR = 1.24).

Conclusions: Psychiatric distress in ophthalmology patients can be monitored passively using an AI algorithm trained on existing EHR data.

Translational Relevance: When paired with an effective referral and treatment program, such algorithms may improve health outcomes in ophthalmology.

Introduction
In patients with ophthalmic disorders, psychosocial risk factors play an important role in morbidity and mortality.1 The prevalence of psychiatric distress (i.e., anxiety and depression) in ophthalmic diseases is high; in studies of cataracts, glaucoma, diabetic retinopathy, and age-related macular degeneration (the leading causes of blindness and vision loss in the United States) the prevalence of psychiatric distress ranged from 5% to 57%.25 Similar prevalence was noted for other common ophthalmic disorders such as dry eyes.6 The presence of psychiatric distress in ophthalmic disorders is associated with worse medication and follow-up adherence,7 disease comprehension,8 and vision-related quality of life,9 as well as increased morbidity10 and health care costs.11 
Proper and early screening of psychiatric distress can result in prompt intervention and can mitigate negative outcomes.12 However, traditional approaches to psychiatric screening present burdens related to cost and time requirements.13 There are examples of practices and clinics that have implemented strategies that limit burdens, including routinely using brief self-report questionnaires. For example, oncology and cardiology clinics have tested a two-stage approach.14 This approach begins with large-scale prescreening of patients using a brief self-report questionnaire15 in oncology using the National Comprehensive Cancer Network (NCCN) distress thermometer16 and in cardiology using the two-item patient health questionnaire.17 Then, only patients who test positive on the prescreening instrument are screened further using a more formal assessment. This approach limits time and cost burdens, as only a subset of high-risk patients receive formal assessment.18 Nonetheless, the two-stage approach is not widely used, due to patient reluctance, time consumption, and a lack of personnel to administer the questionnaires. 
In recent years, there has been an increasing focus on automating screening for distress with the assistance of emerging technologies, including artificial intelligence (AI)-driven emotion recognition,19 to alleviate challenges associated with routine screening.20 These methods, however, require prospective data collection, which is not conducive to screening in clinics where resources are limited.21 To overcome this limitation, we propose developing an automated prescreening measure of psychiatric distress that is based on available and existing electronic health records (EHR) data. EHR data have been used extensively to develop computable phenotypes of, and predictions for, incident medical disorders.22,23 Identifying existing and incident cases of psychiatric distress is critical, as interventions can be tailored to improve patient distress attributed to vision-related diseases.7 
In this study, we developed an automated AI algorithm to predict psychiatric distress among a large cohort of patients attending the Duke Eye Center, a tertiary referral center. We hypothesized that the AI algorithm would have high accuracy to identify distress, indicating that prescreening could be performed automatically at scale, with formal assessment reserved for a predetermined subset of high-risk patients. 
Methods
This was a retrospective cohort study using patients from the Duke Ophthalmic Registry, which consisted of adults at least 18 years of age who were evaluated at the Duke Eye Center or its satellite clinics from 2012 to 2021. The Duke University Institutional Review Board approved this study with a waiver of informed consent due to the retrospective nature of this work. All methods adhered to the tenets of the Declaration of Helsinki for research involving human subjects and were conducted in accordance with regulations of the Health Insurance Portability and Accountability Act. 
Patients were included in the cohort if they had at least two encounters and at least 1 year of follow-up to the Duke Eye Center main site. Eligible encounters included any in which the patient was at least 18 years of age and that occurred between June 2013 and October 2021. 
Psychiatric Distress Outcome
Psychiatric distress (often shortened to distress throughout the manuscript) was defined as a binary indicator at the encounter level using an existing EHR phenotype from the Phenotype KnowledgeBase (PheKB) that defined depression and anxiety.24 Psychiatric distress is one factor of the multifactorial psychosocial distress and was chosen as an outcome in this study because it can be reliably measured from EHR data. Versions of this algorithm have been shown to be associated with the nine-item patient health questionnaire, with area under the receiver operating characteristic (ROC) curve of 0.70 to 0.80.25,26 Distress was defined at the encounter level, because anxiety and depression are not permanent conditions and can be recurrent.27,28 
Distress was defined using International Classification of Diseases (ICD) diagnostic codes, medical history, and Current Procedural Terminology (CPT) procedure codes. The detailed definitions using these codes and medications are found in the pseudo-code for the phenotype available at PheKB. ICD codes for depression and anxiety can be found in Supplementary Tables S1 and S2, respectively. Medications used to identify depression and anxiety can be found in Supplementary Tables S3 and S4, which contain a list of generic and brand names of antidepressant and antianxiety medications. CPT codes are included in Supplementary Table S5 and included procedure codes for delivering psychotherapy. 
For each encounter, distress was defined if an eligible diagnostic or procedure code or medication occurred within a window of 180 days around the encounter date. For a diagnostic code to indicate distress it had to occur on at least two distinct calendar days that are at least 30 days apart and not more than 180 days apart. This is intended to avoid interpreting as an event “rule-out” codes that only appear in a patient's record once for a brief period (i.e., <30 days). The 180-day feature is intended to acknowledge that “rule-out” coding may appear more than once in a patient's medical record. This rule is stricter than the more common approach that only requires the presence of a single code. However, accounting for the nature of the EHR data is likely to lead to a higher positive predictive value. For a medication to indicate distress, it had to occur within 30 days of a corresponding ICD diagnostic code. For example, an anti-anxiety medication had to occur within 30 days of an ICD code indicating anxiety. There were no additional criteria for a CPT code to indicate distress, only that it occurred within 180 days of the encounter date. A visualization of the modeling framework can be found in Figure 1, and the flow chart in Figure 2 shows how patient distress was defined. 
Figure 1.
 
Visualizing the modeling framework. Both the predictors and outcome are defined based on the encounter date as an anchor. The outcome is defined using data collected in a 180-day window around the encounter (pink area). The predictor is defined using all EHR data collected prior to the encounter (blue area) and is broken into three phases of 3 months, 1 year, and 5 years. Red EHR items correspond to ones that qualify for the distress outcome phenotype (e.g., antidepressant medication). In this example, the patient has a diagnostic code and medication (both red) in the outcome period indicating that the patient had distress at the time of the encounter. Importantly, these occurred within 30 days of each other. The EHR history is converted into a vectorized form and fed into a machine learning algorithm (here, the elastic-net model). The algorithm then outputs a probability of distress for each encounter.
Figure 1.
 
Visualizing the modeling framework. Both the predictors and outcome are defined based on the encounter date as an anchor. The outcome is defined using data collected in a 180-day window around the encounter (pink area). The predictor is defined using all EHR data collected prior to the encounter (blue area) and is broken into three phases of 3 months, 1 year, and 5 years. Red EHR items correspond to ones that qualify for the distress outcome phenotype (e.g., antidepressant medication). In this example, the patient has a diagnostic code and medication (both red) in the outcome period indicating that the patient had distress at the time of the encounter. Importantly, these occurred within 30 days of each other. The EHR history is converted into a vectorized form and fed into a machine learning algorithm (here, the elastic-net model). The algorithm then outputs a probability of distress for each encounter.
Figure 2.
 
Flow chart demonstrating how patient distress was defined at the encounter level. Diagnosis and procedure codes come from the ICD.
Figure 2.
 
Flow chart demonstrating how patient distress was defined at the encounter level. Diagnosis and procedure codes come from the ICD.
Risk Factors
For each encounter, risk factors of distress were defined based on the available EHR history for that patient (i.e., any data available by the encounter). The risk factors were broken up into three groups: utilization, demographics, and problem list. The algorithm had 1840 variables as input. 
Utilization
Utilization contained predictors that quantify a patient's use of healthcare services and included diagnostic and procedure codes, medications, and clinical encounters. Diagnostic (ICD) and procedure (CPT) codes were grouped based on the Clinical Classifications Software (CCS) developed by the Agency for Healthcare Research and Quality.29 All ICD codes were categorized using Version 9 ICD codes; thus, all Version 10 ICD codes were first mapped to Version 9 using the general equivalence mappings (Centers for Medicare and Medicaid Services). There were 253 and 239 categories of diagnostic and procedure non-zero codes, respectively. Medications were grouped using the second level of the Anatomical Therapeutic Chemical (ATC) classification. This yielded 81 different drug subgroups. There were 35 distinct clinical encounter types. 
For each encounter, utilization variables were coded as binary indicators of the variable occurring in a time window prior to the encounter (e.g., encounter to an oncology clinic). Three windows were used: within the past 3 months, 1 year, and 5 years. This yielded three binary variables for each utilization variable. For example, consider a patient at an ophthalmology encounter who had an oncology encounter 2 years prior. This patient would have three variables representing their prior utilization of oncology clinics. The variables representing within the past 3 months and 1 year would be zero, indicating no utilization, and the variable representing the past 5 years would be one, representing utilization. 
Because the definition of the outcome included a 180-day window around the encounter, variables that were identified in the 180-day window prior to the encounter and were used to define the outcome were not included when defining the predictors. This was done because the presence of these codes indicates a deterministic relationship with the outcome. In this setting, our algorithm is not needed, and the patient can be assumed to be distressed. Codes that were removed include CCS diagnostic groups (adjustment disorders, alcohol-related disorders, anxiety disorders, disorders usually diagnosed in infancy, childhood, or adolescence, mood disorders, and substance-related disorders), CCS procedure group (psychological and psychiatric evaluation and therapy), and ATC subgroups (N06A, N05A, N05B). As an example, if an ICD code for anxiety showed up 90 days prior to the encounter, the CCS group anxiety disorders would be zeroed out for all three follow-up windows. This zeroing out applied to the presence of a single code, as opposed to the stricter rule defined in the outcome above. This was done to avoid allowing the AI algorithm to learn a near deterministic map. However, if there was an additional ICD code for anxiety at 181 days prior to the encounter, the anxiety variable would be one for both 1 and 5 years. We did not remove these variables that preceded the 180-day window, because previous distress is a predictor of future distress. 
Demographics
The demographic risk factors included age at the encounter (years), sex (male, female), race (Caucasian/white, African American/black, Asian, multiracial, other), ethnicity (non-Hispanic Latino, Hispanic Latino), marital status (married, single), income level, education, and binary behavior indicators of prior use of alcohol, smoking, and illicit drugs. The first level of the categorical variables was used as the reference category. Income level and education were obtained from U.S. Census Bureau's American Community Survey for 2006 to 2011. Income level was measured by per capita income in the past 12 months and was race specific. Education was measured by the percentage of residents who achieved a high-school education and was sex specific. Census data were assigned to patients based on the Zip Code in which they lived. In the models, age and education were scaled by 10, and income was scaled by 10,000. All three continuous predictors were mean centered. Of the demographic variables, race, ethnicity, marriage, income, and education had missing values at a rate less than 10%. These missing values were imputed using single mean imputation, and all patients were included in the final analysis. 
Problem List
Problem list items included any mention of depression (key words: depressed, depression, major depression) or anxiety (anxiety, obsessive compulsive disorder, panic attacks, panic disorders, post-traumatic stress disorder). Problem list items were again coded temporally, using 3 months, 1 year, and 5 years. For both anxiety and depression, an indicator was created to signal if at least one measure was present in each of the temporal ranges. Because problem list items were not used to define the PheKB outcome, we did not remove any problem list items within 180 days of the encounter. 
Training the Model
To predict psychiatric distress using the EHR history upon encounter at the Duke Eye Center, we used three machine learning classification models: elastic-net,30 Random Forest,31 and CatBoost.32 Elastic-net is a regularized linear model that penalizes overfitting by shrinking the regression coefficients toward zero. The log-likelihood function to be minimized is  
\begin{eqnarray*} &&{-}\frac{1}{n}\sum_{i=1}^n y_i (\beta_0+x^T_i{\boldsymbol\beta})-\log (1+e^{\beta_0+x^T_i{\boldsymbol\beta}})\\ &&\qquad +\;\lambda[(1-\alpha)\Vert{\boldsymbol\beta}\Vert^2_2/2+\alpha\Vert{\boldsymbol\beta}\Vert_1], \end{eqnarray*}
where xi is a p-dimensional vector of the EHR history; yi is an indicator of distress for encounter i = 1, …, n; β is a vector of regression coefficients; and β0 is an intercept. The penalty term λ represents the degree of penalization, and the elastic-net term α bridges the gap between a least absolute shrinkage and selection operator (LASSO; α = 1) and Ridge regression (α = 0). It is known that Ridge regression shrinks coefficients of correlated predictors toward each other, whereas LASSO tends to pick one and zero-out the others. The potential weakness of a linear model is that variables may interact with one another in ways we do not know beforehand. Decision trees are often a successful method in such situations, but they can have overfitting problems; the risk of overfitting is reduced by ensemble methods such as random forests or gradient boosting, both of which we also employ. The Random Forest algorithm uses an ensemble of decision trees to make robust predictions, where the output of the Random Forest algorithm is given by the class selected by the most trees. CatBoost is a gradient boosting algorithm for binary classification trees that uses ordered boosting to overcome overfitting and allows for categorical features to be handled natively. 
Although ridge regression can be accomplished by traditional gradient descent, the other (absolute value) penalty term in the elastic-net algorithm means we instead must use a different method; the most efficient currently is cyclical coordinate descent, which we performed in glmnet.33 To both tune the pair (λ, α) and estimate model accuracy in an unbiased manner, we used nested cross-validation, with the inner loop tuning (λ, α) via a grid search to minimize internal cross-validation error minus one standard error,34 and with the outer loop estimating model performance including area under the curve (AUC) for both ROC and precision–recall (PR) curves. The Random Forest algorithm was run with 500 trees and one random split for each candidate splitting variable. CatBoost was run with default parameters using the cross-entropy loss. All three models were implemented using 10-fold cross-validation. An additional test dataset was not used. To prevent data leakage, sampling was performed at the patient level, and the percent of distress was balanced across folds. 
Statistical Analysis
The overall performance of the models was evaluated using the ROC and PR AUC, estimated by cross-validation as in the preceding paragraph. All performance metrics are presented as a mean and standard deviation (SD) across the 10 cross-validation folds. PR curves plot precision (i.e., positive predictive value) against recall (i.e., sensitivity) and are useful when there is imbalance in the cases and controls. A PR curve depends only on predictions of the minority class, as precision and recall do not depend on true negatives. 
These summaries are presented overall and for subgroups, including within each subspecialty at the Duke Eye Center, diseases including primary open-angle glaucoma (POAG), diabetic retinopathy, age-related macular degeneration (AMD), and cataracts, and demographics. Disease diagnoses were based on ICD diagnostic code definitions from previous studies3537 and had to occur within 30 days of a corresponding clinic encounter (POAG from the glaucoma clinic, AMD and diabetic retinopathy from the vitreous retinal clinic, cataracts from any clinic). Finally, to determine the importance of the time interval for prediction (i.e., 3 months, 1 year, 5 years), we examined overall performance for models that included only data from each time interval. 
Additionally, sensitivity is presented across varying levels of specificity. Sensitivity is presented for all cases of distress and a subset of cases determined to be incident or new. A new case was required to be the first case present for a patient and to have no prior encounters to a psychiatry clinic. Furthermore, the patient could have no problem list, medication, or procedure items prior to the encounter that suggested any psychiatric diagnoses. We highlight sensitivity at a specificity of 70%. This value comes from published data from a two-stage screening approach, where we used the proportion of patients referred from the prescreening questionnaire who ended up not having distress in a more formal evaluation (i.e., specificity).14,15 Finally, for the elastic-net model, the largest 25 coefficients in absolute value were presented, along with their odds ratios (ORs). The ORs were averaged across the 10 cross-validation folds. The non-zero demographic predictors were also presented. OR P values were not presented, as they cannot be computed reliably for the elastic-net model.38 
The summaries for the cohort are presented with continuous variables presented as mean and standard deviation and categorical variables as counts and percentages. Hypothesis tests are presented across distress group; categorical variables were tested using a χ2 test, and continuous variables were tested using a Wilcoxon rank-sum test. Patient data were anonymized, and all statistical analyses were conducted using R 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria) within the Protected Analytics Computing Environment (PACE). PACE is a secure virtual network space developed by Duke University for the analysis of identifiable protected health information. The R packages glmnet, ranger, and catboost were used to carry out the models.30,39,40 
Results
The study cohort consisted of 358,135 encounters from 40,326 patients with an average ± SD of 9 ± 10 encounters per patient over 4 ± 2 years of follow-up. The average age of the patients was 60 ± 17 years, with a breakdown of 23,762 (59%) females, 27,323 (68%) Caucasian/white, 10,573 (26%) African American/black, and the rest Asian, multiracial, and other races. There were 6069 (15%) patients with at least one encounter with corresponding distress. Full summary details at the patient level can be found in Table 1. Encounter level summaries can be found in Table 2, with the top seven predictors by base rate presented for each utilization category, along with problem list items. 
Table 1.
 
Summary of Demographics Presented Across Patient Distress Indicators
Table 1.
 
Summary of Demographics Presented Across Patient Distress Indicators
Table 2.
 
Summary of Utilization and Problem Lists Calculated Using the Entire EHR History Presented Across Encounter Distress Indicators
Table 2.
 
Summary of Utilization and Problem Lists Calculated Using the Entire EHR History Presented Across Encounter Distress Indicators
The optimal tuning parameters in the elastic-net model were found to be α = 0.08 ± 0.03 and λ = 0.04 ± 0.01, indicating a preference for Ridge regression. The original number of predictors included was 1840, and after regularization only 292 remained that were non-zero in at least one cross-validation fold. 
The ROC and PR curves for the three machine learning algorithms are presented in Figure 3. The intervals correspond to 95% cross-validation confidence intervals. The mean ± SD ROC AUCs for elastic-net, CatBoost, and Random Forest were 0.912 ± 0.007, 0.918 ± 0.007, and 0.913 ± 0.007, respectively, with PR AUCs of 0.547 ± 0.032, 0.575 ± 0.031, and 0.552 ± 0.033. For a PR curve, a non-informative classifier would yield an AUC equal to the prevalence of distress in the population, 7% at the encounter level. The improvements from CatBoost and Random Forest were minimal compared to elastic-net and within the range of cross-validation error. Because the elastic-net model was comparable in terms of performance with the more complex algorithms and yields interpretable feature importance values as OR, the remaining results are presented using the elastic-net model. 
Figure 3.
 
ROC and PR curves for the elastic-net, CatBoost, and Random Forest algorithms. In parentheses are mean ± SD for ROC and PR AUCs across cross-validation folds. Intervals represent 95% cross-validation confidence intervals. The horizontal line on the PR curve represents the prevalence of distress across encounters (7%).
Figure 3.
 
ROC and PR curves for the elastic-net, CatBoost, and Random Forest algorithms. In parentheses are mean ± SD for ROC and PR AUCs across cross-validation folds. Intervals represent 95% cross-validation confidence intervals. The horizontal line on the PR curve represents the prevalence of distress across encounters (7%).
Table 3 includes the ROC and PR AUCs across subspecialty, diseases, and demographics, along with the base rate and prevalence of distress within each subgroup. AUC performances ranged from 0.87 to 0.94 for ROC curves and 0.52 to 0.63 for PR curves. The ROC and PR AUCs for neuro-ophthalmology, the subspecialty with the highest level of distress at 12.2%, were 0.89 and 0.60, respectively. For POAG and AMD (the diseases with the highest rates of distress at 7.4% and 7.3%, respectively), the ROC and PR AUCs were 0.91 and 0.90 and 0.57 and 0.56, respectively. Finally, Supplementary Figure S1 and Table S6 present results of the elastic-net model with only predictors from 3 months, 1 year, and 5 years prior to the encounter. 
Table 3.
 
Performance Metrics Presented Across Subgroups
Table 3.
 
Performance Metrics Presented Across Subgroups
Figure 4 presents the sensitivity values for existing and new distress across a continuum of specificity values. At a specificity level of 0.70, the sensitivity values were 0.92 ± 0.01 and 0.71 ± 0.06, respectively, for existing and new distress. Table 4 presents the top 25 predictors of distress from the elastic-net model. The full list of non-zero predictors is given in Supplementary Table S7. Finally, in Table 5, we present the non-zero coefficients for the demographic and risky-behavior predictors. In Table 5 and Supplementary Table S7, ORs that rounded to 1.00 have an additional column indicating the direction of the association. 
Figure 4.
 
Sensitivity values for existing and new distress across a continuum of specificity values. New distress is defined as any distress encounter that was the first distress encounter for each patient and was not preceded by an encounter to a psychiatry clinic. The vertical line represents 70% specificity, which we used to compare our results to previous studies.
Figure 4.
 
Sensitivity values for existing and new distress across a continuum of specificity values. New distress is defined as any distress encounter that was the first distress encounter for each patient and was not preceded by an encounter to a psychiatry clinic. The vertical line represents 70% specificity, which we used to compare our results to previous studies.
Table 4.
 
ORs for the Top 25 Predictors of Distress Using All Predictor Types With Variables Ordered by the Absolute Value of Their Coefficient
Table 4.
 
ORs for the Top 25 Predictors of Distress Using All Predictor Types With Variables Ordered by the Absolute Value of Their Coefficient
Table 5.
 
Demographic Predictors With Non-Zero Coefficients
Table 5.
 
Demographic Predictors With Non-Zero Coefficients
Discussion
In this study, we introduced an AI algorithm that automates prescreening of psychiatric distress using existing EHR data. Our findings suggest that prescreening of distress can be accomplished at scale, eliminating previous hurdles to scalability in the two-stage approach including patient reluctance, time consumption, and a lack of personnel to administer the questionnaires. This finding is particularly important in an ophthalmology setting, where patients have high levels of distress, yet there is no existing infrastructure for distress screening. 
In our study, 15% of patients had at least one encounter with distress. This value is consistent with previously reported prevalence of anxiety and depression in patients with ophthalmic disorders, which ranged from 5% to 57%.25 Our prevalence is likely on the lower end, as our inclusion criteria did not include a disease diagnosis. At the encounter level, the prevalence of distress was 7%, with neuro-ophthalmology having the highest rate of distress at 12.2%. This finding is consistent with literature that indicates the overlap of neuro-ophthalmology and psychiatric conditions.41 Of the diseases we included in our study, distress was highest among patients with POAG (7.4%) and AMD (7.3%). This is consistent with previous findings, as patients with POAG and AMD are at higher risk for anxiety and depression.42,43 
Our algorithm had high classification accuracy, with ROC and PR AUCs of 0.912 ± 0.007 and 0.547 ± 0.032, respectively. The performance was consistent across varying subspecialties, diseases, and demographics. Of note, for the high-distress subgroups (i.e., neuro-ophthalmology, POAG, and AMD), the PR AUCs are on the upper end of performance with values of 0.60 ± 0.06, 0.57 ± 0.06, and 0.56 ± 0.15, respectively. 
We also presented sensitivity values at a fixed specificity of 0.70, which has meaning based in literature comparing brief pre-screening surveys to a gold-standard psychiatric assessment of distress. For example, Cull et al.14 reported a sensitivity and specificity of 0.85 and 0.71, respectively, when comparing the Hospital Anxiety and Depression Scale to a gold-standard psychiatric interview in oncology patients. Another study, using a meta-analysis, found a sensitivity of 0.81 (0.79–0.82) at a specificity of 0.72 when comparing the NCCN Distress Thermometer to the Hospital Anxiety and Depression Scale.15 Neither study distinguished between existing and new distress. In our study, at a specificity level of 0.70 the sensitivity values were 0.92 ± 0.01 and 0.71 ± 0.06, respectively, for existing and new distress. These results are promising and indicate that our modeling framework may be able to replicate the operating characteristics of existing pre-screening surveys for both existing and new distress. 
In ophthalmology clinics, identifying both patients who have already been treated for distress and those with new distress is an important task. For patients with existing distress who have already been treated, the algorithm can be viewed as a computable phenotype that collects existing EHR data and returns a summary statistic that represents level of distress. If the PheKB phenotype is a valid measure of distress, this viewpoint should hold. This is important in the context of ophthalmic disorders, as there is evidence that interventions can be tailored to specific eye disorders to improve well-being. For example, a recent study in visually impaired glaucoma patients demonstrated that a social work intervention decreased distress.7 This intervention provided support for these patients that was tailored to eye-related distress, including procuring closed-circuit televisions. Thus, the algorithm removes any barriers to identifying distressed patients during routine care and sets up a system where distressed patients can be referred to an intervention that is tailored to patients with eye disorders. 
This is particularly impactful in an ophthalmology context, where resources and priorities do not permit screening for distress, even if it is already present in the EHR. Our algorithm performs well in this setting, with a sensitivity of 0.92 at a specificity of 0.70. For patients not currently being treated for distress, the algorithm can be viewed as a prediction model that identifies incident distress. At a specificity level of 0.70, our model has a sensitivity of 0.83 for new cases, indicating that the performance is still adequate in patients without psychiatric indicators in their EHR history. This is a particularly impactful finding because our modeling framework uses EHR data for both the predictors and outcome. Therefore, based on the design of our model, it follows that we can identify existing distress at a high rate. The fact that we can identify incident distress with no EHR history indicates that the model can be used more generally to predict distress. This is further illuminated in the non-zero variables in the elastic-net model. 
Of the top 25 predictors in Table 4, we see that the majority are related to existing distress, including the first variable, an encounter to a psychiatry clinic within the past 3 months. This reinforces that our model performs well for existing cases of distress. Also present, however, are variables that are associated with distress but are not direct indicators of healthcare utilization related to distress, including suffocation and strangulation (OR = 1.88); self-inflicted injuries, including attempt of suicide (1.58); chemotherapy (1.36); esophageal disorders, including esophagitis (1.31); other nervous system disorders, including central pain syndrome (1.25); and headaches and migraines (1.24). The presence of these variables is more evidence that our model is not simply identifying existing characteristic healthcare utilization for distress. 
Although our model performs adequately for incident distress, we could likely improve the performance by including diagnostic tests, including clinical measures of disease severity. We did not do this initially, because we restricted ourselves to data input by clinicians and billing codes from Duke University’s EHR system Epic. In the future, we will expand our algorithm to include these diagnostic test data, including imaging data. These measures are not readily available and must be extracted from individual instruments. The current model can be applied in a more general EHR system without additional data collection and curation. Furthermore, our patients may have been receiving psychiatric care outside the Duke University Health System that we did not have access to when defining our distress outcome. We tried to minimize this by limiting our patient population to those attending the Duke Eye Center main site, which is in Durham, NC, where the Duke University main hospital and majority of outpatient clinics are located. To fully overcome this limitation, a formal external validation of our AI algorithm should be performed using a gold-standard assessment of distress. Finally, prior to this algorithm being employed outside of Duke, it should be trained with data from multiple health centers to permit generalizability. 
In our study, we looked at the performance of three machine learning models, including two nonlinear ones (CatBoost and Random Forest), with the linear model elastic-net winning due to interpretability. In the future, it would be beneficial to also include a state-of-the-art deep neural network model, although there is a tradeoff in interpretability and performance. This will become more important for natural language processing approaches that can be used to incorporate clinical progress notes as predictors in the EHR history. Furthermore, there are improvements that can be considered for the actual model structure; for example, the models would likely be improved if they accounted for dependencies introduced by encounters belonging to the same patient. 
Finally, because our model has demonstrated that it can identify patients with distress, an important question then becomes what to do with these patients? There is substantial evidence that screening alone is not enough and that an efficient referral system and evidence-based treatment are necessary.44 Thus, developing a system for identifying and treating patients with distress in ophthalmology clinics will require buy-in from the patient, provider, payer, and healthcare system. Future studies will have to focus on the development of referral systems that are acceptable to patients and providers, as well as interventions, including vision and behavioral interventions, that could improve patients’ quality of life and are focused on improving distress related to specific vision-related disorders. A special focus will have to be on guaranteeing that patients will have access to appropriate care, regardless of demographics and distress severity. 
In conclusion, our study demonstrated that prescreening for distress in ophthalmology patients can be automated using an AI algorithm trained on existing EHR data. The algorithm identified distress in patients already being treated, and in those with incident distress. These findings suggest that screening for distress in ophthalmology clinics is feasible and may reduce negative health outcomes in patients. 
Acknowledgments
Supported by grants from the National Eye Institute, National Institutes of Health (K99EY033027 to SIB, R01EY029885 to FAM, and R21EY031898 to FAM). The sponsor or funding organization had no role in the design or conduct of this research. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 
Disclosure: S.I. Berchuck, None; A.A. Jammal, None; D. Page, None; T.J. Somers, None; F.A. Medeiros, Aerie Pharmaceuticals (C), Allergan (C, F), Annexon (C), Biogen (C), Carl Zeiss Meditec (C, F), Galimedix (C), Google (F), Heidelberg Engineering (F), IDx (C), NGoggle Diagnostics (P), Novartis (F), Stealth Biotherapeutics (C), Reichert (C, F) 
References
McCusker S, Koola MM. Association of Ophthalmologic Disorders and Depression in the Elderly. Prim Care Companion CNS Disord. 2015; 17(4): 10.4088/PCC.14r01731, https://doi.org/10.4088/PCC.14r01731.
Congdon N, O'Colmain B, Klaver CCW, et al. Causes and prevalence of visual impairment among adults in the United States. Arch Ophthalmol. 2004; 122(4): 477–485, https://doi.org/10.1001/archopht.122.4.477. [PubMed]
Zheng Y, Wu X, Lin X, Lin H. The prevalence of depression and depressive symptoms among eye disease patients: a systematic review and meta-analysis. Sci Rep. 2017; 7(1): 46453, https://doi.org/10.1038/srep46453. [CrossRef] [PubMed]
Diniz-Filho A, Abe RY, Cho HJ, Baig S, Gracitelli CPB, Medeiros FA. Fast visual field progression is associated with depressive symptoms in patients with glaucoma. Ophthalmology. 2016; 123(4): 754–759. [CrossRef] [PubMed]
Zhang X, Olson DJ, Le P, Lin F-C, Fleischman D, Davis RM. The association between glaucoma, anxiety, and depression in a large population. Am J Ophthalmol. 2017; 183: 37–41. [CrossRef] [PubMed]
Clarke A, Rumsey N, Collin JRO, Wyn-Williams M. Psychosocial distress associated with disfiguring eye conditions. Eye (Lond). 2003; 17(1): 35–40, https://doi.org/10.1038/sj.eye.6700234. [CrossRef] [PubMed]
Hark LA, Madhava M, Radakrishnan A, et al. Impact of a social worker in a glaucoma eye care service: a prospective study. Health Soc Work. 2019; 44(1): 48–56, https://doi.org/10.1093/hsw/hly038. [CrossRef] [PubMed]
Kong XM, Zhu WQ, Hong JX, Sun XH. Is glaucoma comprehension associated with psychological disturbance and vision-related quality of life for patients with glaucoma? A cross-sectional study. BMJ Open. 2014; 4(5): e004632, https://doi.org/10.1136/bmjopen-2013-004632. [CrossRef] [PubMed]
Chen X, Lu L. Depression in diabetic retinopathy: a review and recommendation for psychiatric management. Psychosomatics. 2016; 57(5): 465–471, https://doi.org/10.1016/j.psym.2016.04.003. [CrossRef] [PubMed]
Berchuck SI, Jammal A, Mukherjee S, Somers T, Medeiros FA. Impact of anxiety and depression on progression to glaucoma among glaucoma suspects. Br J Ophthalmol. 2021; 105(9): 1244–1249, https://doi.org/10.1136/bjophthalmol-2020-316617. [CrossRef] [PubMed]
Grenard JL, Munjas BA, Adams JL, et al. Depression and medication adherence in the treatment of chronic diseases in the United States: a meta-analysis. J Gen Intern Med. 2011; 26(10): 1175–1182, https://doi.org/10.1007/s11606-011-1704-y. [CrossRef] [PubMed]
Cunningham SC, Aizvera J, Wakim P, Felber L. Use of a self-reported psychosocial distress screening tool as a predictor of need for psychosocial intervention in a general medical setting. Soc Work Health Care. 2018; 57(5): 315–331, https://doi.org/10.1080/00981389.2018.1437499. [CrossRef] [PubMed]
Mitchell AJ, Kaar S, Coggan C, Herdman J. Acceptability of common screening methods used to detect distress and related mood disorders—preferences of cancer specialists and non-specialists. Psycho-Oncology. 2008; 17(3): 226–236, https://doi.org/10.1002/pon.1228. [CrossRef] [PubMed]
Cull A, Gould A, House A, et al. Validating automated screening for psychological distress by means of computer touchscreens for use in routine oncology practice. Br J Cancer. 2001; 85(12): 1842–1849, https://doi.org/10.1054/bjoc.2001.2182. [CrossRef] [PubMed]
Ma X, Zhang J, Zhong W, et al. The diagnostic role of a short screening tool—the distress thermometer: a meta-analysis. Support Care Cancer. 2014; 22(7): 1741–1755, https://doi.org/10.1007/s00520-014-2143-1. [CrossRef] [PubMed]
Donovan KA, Jacobsen PB. Progress in the implementation of NCCN guidelines for distress management by member institutions. J Natl Comp Canc Netw. 2013; 11(2): 223–226, https://doi.org/10.6004/jnccn.2013.0029. [CrossRef]
Lichtman JH, Bigger JT, Blumenthal JA, et al. Depression and coronary heart disease. Circulation. 2008; 118(17): 1768–1775, https://doi.org/10.1161/CIRCULATIONAHA.108.190769. [CrossRef] [PubMed]
Fann JR, Berry DL, Wolpin S, et al. Depression screening using the Patient Health Questionnaire-9 administered on a touch screen computer. Psycho-Oncology. 2009; 18(1): 14–22, https://doi.org/10.1002/pon.1368. [CrossRef] [PubMed]
Kelly D, Curran K, Caulfield B. Automatic prediction of health status using smartphone-derived behavior profiles. IEEE J Biomed Health Inform. 2017; 21(6): 1750–1760, https://doi.org/10.1109/JBHI.2017.2649602. [CrossRef] [PubMed]
Rana R, Latif S, Gururajan R, et al. Automated screening for distress: a perspective for the future. Eur J Cancer Care. 2019; 28(4): e13033, https://doi.org/10.1111/ecc.13033. [CrossRef]
Latif S, Rana R, Khalifa S, Jurdak R, Epps J, Schuller BW. Multi-task semi-supervised adversarial autoencoding for speech emotion recognition. arXiv. 2020, https://doi.org/10.48550/arXiv.1907.06078.
Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J Biomed Inform. 2017; 68: 112–120. [CrossRef] [PubMed]
Simon GE, Johnson E, Lawrence JM, et al. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018; 175(10): 951–960. [CrossRef] [PubMed]
PheKB. Depression. Available at: https://phekb.org/phenotype/depression. Accessed September 22, 2022.
Huang SH, LePendu P, Iyer SV, Tai-Seale M, Carrell D, Shah NH. Toward personalizing treatment for depression: predicting diagnosis and severity. J Am Med Inform Assoc. 2014; 21(6): 1069–1075, https://doi.org/10.1136/amiajnl-2014-002733. [CrossRef] [PubMed]
Ingram WM, Baker AM, Bauer CR, et al. Defining major depressive disorder cohorts using the EHR: multiple phenotypes based on ICD-9 codes and medication orders. Neurol Psychiatry Brain Res. 2020; 36: 18–26, https://doi.org/10.1016/j.npbr.2020.02.002. [CrossRef] [PubMed]
Monroe SM, Harkness KL. Major depression and its recurrences: life course matters. Ann Rev Clin Psychol. 2022; 18(1): 329–357, https://doi.org/10.1146/annurev-clinpsy-072220-021440. [CrossRef]
Scholten WD, Batelaan NM, van Balkom AJ, Wjh Penninx B, Smit JH, van Oppen P. Recurrence of anxiety disorders and its predictors. J Affect Disord. 2013; 147(1–3): 180–185, https://doi.org/10.1016/j.jad.2012.10.031. [PubMed]
Elixhauser A, Steiner C, Palmer L. Clinical Classifications Software (CCS). Available at: https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed September 22, 2022.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1): 1–22. [CrossRef] [PubMed]
Breiman L. Random forests. Mach Learn. 2001; 45: 5–32, https://doi.org/10.1023/A:1010933404324. [CrossRef]
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. arXiv. 2019, https://doi.org/10.48550/arXiv.1706.09516.
Tibshirani R, Bien J, Friedman J, et al. Strong rules for discarding predictors in LASSO-type problems. J R Stat Soc Series B Stat Methodol. 2012; 74(2): 245–266, https://doi.org/10.1111/j.1467-9868.2011.01004.x. [CrossRef] [PubMed]
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. New York: Routledge; 1984.
Newton KM, Peissig PL, Kho AN, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013; 20(e1): e147–e154, https://doi.org/10.1136/amiajnl-2012-000896. [CrossRef] [PubMed]
Jammal AA, Thompson AC, Mariottoni EB, et al. Rates of glaucomatous structural and functional change from big data: the Duke Glaucoma Registry study. Am J Ophthalmol. 2020; 222: 238–247, https://doi.org/10.1016/j.ajo.2020.05.019. [CrossRef] [PubMed]
Simonett JM, Sohrab MA, Pacheco J, et al. A validated phenotyping algorithm for genetic association studies in age-related macular degeneration. Sci Rep. 2015; 5(1): 12875, https://doi.org/10.1038/srep12875. [CrossRef] [PubMed]
Bujak R, Daghir-Wojtkowiak E, Kaliszan R, Markuszewski MJ. PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data. Front Mol Biosci. 2016; 3: 35, https://doi.org/10.3389/fmolb.2016.00035. [CrossRef] [PubMed]
Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017; 77(1): 1–17, https://doi.org/10.18637/jss.v077.i01. [CrossRef]
Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXiv. 2018, https://doi.org/10.48550/arXiv.1810.11363.
Newman NJ . Neuro-ophthalmology and psychiatry. Gen Hosp Psychiatry. 1993; 15(2): 102–114, https://doi.org/10.1016/0163-8343(93)90106-x. [CrossRef] [PubMed]
Mabuchi F, Yoshimura K, Kashiwagi K, et al. High prevalence of anxiety and depression in patients with primary open-angle glaucoma. J Glaucoma. 2008; 17(7): 552–557. [CrossRef] [PubMed]
Williams RA. The psychosocial impact of macular degeneration. Arch Ophthalmol. 1998; 116(4): 514–520, https://doi.org/10.1001/archopht.116.4.514. [CrossRef] [PubMed]
Carlson LE. Screening alone is not enough: the importance of appropriate triage, referral, and evidence-based treatment of distress and common problems. J Clin Oncol. 2013; 31(29): 3616–3617, https://doi.org/10.1200/jco.2013.51.4315. [CrossRef] [PubMed]
Figure 1.
 
Visualizing the modeling framework. Both the predictors and outcome are defined based on the encounter date as an anchor. The outcome is defined using data collected in a 180-day window around the encounter (pink area). The predictor is defined using all EHR data collected prior to the encounter (blue area) and is broken into three phases of 3 months, 1 year, and 5 years. Red EHR items correspond to ones that qualify for the distress outcome phenotype (e.g., antidepressant medication). In this example, the patient has a diagnostic code and medication (both red) in the outcome period indicating that the patient had distress at the time of the encounter. Importantly, these occurred within 30 days of each other. The EHR history is converted into a vectorized form and fed into a machine learning algorithm (here, the elastic-net model). The algorithm then outputs a probability of distress for each encounter.
Figure 1.
 
Visualizing the modeling framework. Both the predictors and outcome are defined based on the encounter date as an anchor. The outcome is defined using data collected in a 180-day window around the encounter (pink area). The predictor is defined using all EHR data collected prior to the encounter (blue area) and is broken into three phases of 3 months, 1 year, and 5 years. Red EHR items correspond to ones that qualify for the distress outcome phenotype (e.g., antidepressant medication). In this example, the patient has a diagnostic code and medication (both red) in the outcome period indicating that the patient had distress at the time of the encounter. Importantly, these occurred within 30 days of each other. The EHR history is converted into a vectorized form and fed into a machine learning algorithm (here, the elastic-net model). The algorithm then outputs a probability of distress for each encounter.
Figure 2.
 
Flow chart demonstrating how patient distress was defined at the encounter level. Diagnosis and procedure codes come from the ICD.
Figure 2.
 
Flow chart demonstrating how patient distress was defined at the encounter level. Diagnosis and procedure codes come from the ICD.
Figure 3.
 
ROC and PR curves for the elastic-net, CatBoost, and Random Forest algorithms. In parentheses are mean ± SD for ROC and PR AUCs across cross-validation folds. Intervals represent 95% cross-validation confidence intervals. The horizontal line on the PR curve represents the prevalence of distress across encounters (7%).
Figure 3.
 
ROC and PR curves for the elastic-net, CatBoost, and Random Forest algorithms. In parentheses are mean ± SD for ROC and PR AUCs across cross-validation folds. Intervals represent 95% cross-validation confidence intervals. The horizontal line on the PR curve represents the prevalence of distress across encounters (7%).
Figure 4.
 
Sensitivity values for existing and new distress across a continuum of specificity values. New distress is defined as any distress encounter that was the first distress encounter for each patient and was not preceded by an encounter to a psychiatry clinic. The vertical line represents 70% specificity, which we used to compare our results to previous studies.
Figure 4.
 
Sensitivity values for existing and new distress across a continuum of specificity values. New distress is defined as any distress encounter that was the first distress encounter for each patient and was not preceded by an encounter to a psychiatry clinic. The vertical line represents 70% specificity, which we used to compare our results to previous studies.
Table 1.
 
Summary of Demographics Presented Across Patient Distress Indicators
Table 1.
 
Summary of Demographics Presented Across Patient Distress Indicators
Table 2.
 
Summary of Utilization and Problem Lists Calculated Using the Entire EHR History Presented Across Encounter Distress Indicators
Table 2.
 
Summary of Utilization and Problem Lists Calculated Using the Entire EHR History Presented Across Encounter Distress Indicators
Table 3.
 
Performance Metrics Presented Across Subgroups
Table 3.
 
Performance Metrics Presented Across Subgroups
Table 4.
 
ORs for the Top 25 Predictors of Distress Using All Predictor Types With Variables Ordered by the Absolute Value of Their Coefficient
Table 4.
 
ORs for the Top 25 Predictors of Distress Using All Predictor Types With Variables Ordered by the Absolute Value of Their Coefficient
Table 5.
 
Demographic Predictors With Non-Zero Coefficients
Table 5.
 
Demographic Predictors With Non-Zero Coefficients
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×