Input features were extracted from the EHR and included demographic information, past ocular surgeries, diagnoses, medications, social history (health-related behaviors), and ophthalmology-specific clinical exam findings. Categorical features were transformed to dummy encoded features (0 or 1) and continuous numerical variables were standardized to have a mean of 0 and a variance of 1. All feature values were collected at baseline, from the preoperative period.
Categorical features included surgery CPT code, race, ethnicity, gender, prior encounter diagnoses (International Classification of Disease [ICD] codes), preoperative medications, number and type of prior glaucoma surgeries, concurrent cataract surgery (CPT code 66982 or 66984 on the same day as glaucoma surgery), tobacco/alcohol/drug use, contact/glasses use, prior selective laser trabeculoplasty, and prior laser peripheral iridotomy. Categorical features were dummy-encoded. Medications included all prescriptions in the five years before the operation based on EHR medication records. Medications were mapped to their generic name and included ophthalmic medications and systemic medications. Medication features were represented as Boolean variables, with 1 indicating if the patient was prescribed this medication before surgery and 0 otherwise. Variance elimination was performed to keep the 100 medication features with the highest variance each for ophthalmic medications and for systemic medications. ICD codes were aggregated to the level of two numbers after the decimal (e.g., H40.1212 became H40.12). Each ICD feature was also represented as a Boolean variable, 1 if the patient had a preoperative encounter associated with this diagnosis, 0 otherwise. Variance elimination was performed to include the 100 diagnosis code variables with the highest variance.
Continuous variables included age at the time of surgery, latest preoperative value of eye exam findings for the surgical eye (IOP, best recorded visual acuity [VA], central corneal thickness, refraction spherical equivalent), number of prior ophthalmic surgeries. VA was converted to mean logarithm of the minimum angle of resolution (logMAR). Other continuous variables were standardized to mean 0 and standard deviation of 1. Missing values for eye examination findings were imputed using the column mean, and an indicator column was created to indicate whether the measurement was missing and thus imputed (<6% rate of missingness overall, with 0% missingness for IOP measurements). There were a total of 389 input features, including 100 features each for diagnoses, systemic medications, and ophthalmic medications.
A held-out test set was reserved using 20% of the cohort data (N = 480 surgeries), while ensuring that no patient appeared both in the training and the test set. The remaining N = 1918 surgeries were used for training and five-fold cross-validation.