Clinical notes were preprocessed to remove stopwords (a, all, also, an, and, are, as, at, be, been, by, for, from, had, has, have, in, is, it, may, of, on, or, our, than, that, the, there, these, this, to, was, we, were, which, who, with). All letters were processed to be lowercase. Because RoBERTa and BioBERT were pre-trained based on a cased vocabulary, in a sensitivity analysis we also fine-tuned these models on cased ophthalmology clinical text, with similar results. BERT-based models take as inputs each individual word (“token”) and break up rare words into multiple subwords to reduce the overall model vocabulary size. Subword tokenization algorithms were applied to our text to prepare it for input into our BERT-based models. After subword tokenization, clinical notes were truncated or padded to the required 512 token input length, which is the maximum length BERT can accept. BERT was designed as a fixed-length model with a maximum of 512 tokens, including the starting token, [CLS], and ending token, [SEP].