To determine if our primary result was sensitive to our underlying definitions and assumptions for the underlying RCTs and values of noise, we performed sensitivity analyses, comparing the frequency of RCTs that had a significant difference between treatments A and B, that is, the proportion of RCTs with an erroneous conclusion. We then compared the proportion of RCTs with erroneous conclusions, with and without an early failure rule, using Fisher exact tests.
We evaluated the effect of the following parameters: sample size of each RCT (2000 and 4000), distribution of the baseline stereoacuity scores; pseudo normal (proportions of cohort in each enrollment bin of 1.6, 1.8, 2, 2.3, and 2.6 log
10 arc sec; 0.1, 0.2, 0.4, 0.2, and 0.1), right skewed (proportions 0.1, 0.4, 0.25, 0.15, and 0.1), left skewed (proportions 0.1, 0.15, 0.25, 0.4, and 0.1), and based on the distribution of enrollment stereoacuity in the previous RCT of unilateral resect-recess versus bilateral lateral rectus recession for intermittent exotropia
1,2 (proportions 0.15, 0.25, 0.26, 0.14, and 0.2). We also evaluated the rate and magnitude of stereoacuity improvement (final true mean stereoacuity of 1.9 and 1.7), magnitude of change required to declare early failure (1 level and 3 levels), and distribution of noise (mean of 0.00 and standard deviation of 0.25, and mean of 0.00 and standard deviation of 0.35, both truncated at 0.8 and −0.8 and normally distributed).
All analyses were conducted in RStudio, version 2023.09.1 Build 494 (RStudio Team; RStudio: Integrated Development Environment for R. RStudio, PBC. 2022. Accessed June 25, 2024,
http://www.rstudio.com/).