December 2024
Volume 13, Issue 12
Open Access
Pediatric Ophthalmology & Strabismus  |   December 2024
Risk of Bias When Using Early Failure Criteria in Randomized Clinical Trials With Stereoacuity Outcomes
Author Affiliations & Notes
  • Meet Panjwani
    Department of Ophthalmology and Vision Science, University of Arizona - Tucson, Tucson, AZ, USA
  • Jonathan M. Holmes
    Department of Ophthalmology and Vision Science, University of Arizona - Tucson, Tucson, AZ, USA
  • Correspondence: Meet Panjwani, University of Arizona, Department of Ophthalmology and Vision Science, 655 N. Alvernon Way, Suite 204, Tucson, AZ 85711, USA. e-mail: [email protected] 
Translational Vision Science & Technology December 2024, Vol.13, 1. doi:https://doi.org/10.1167/tvst.13.12.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Meet Panjwani, Jonathan M. Holmes; Risk of Bias When Using Early Failure Criteria in Randomized Clinical Trials With Stereoacuity Outcomes. Trans. Vis. Sci. Tech. 2024;13(12):1. https://doi.org/10.1167/tvst.13.12.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The purpose of this study was to explore the effects of early failure criteria for participants in randomized clinical trials (RCTs) on overall trial conclusions.

Method: We simulated 10,000 hypothetical RCTs with 2 treatments, 1 linear improvement and 1 with increasing rate of improvement and 6 follow-up visits. Each RCT had 400 participants, with the same baseline stereoacuity distribution. We incorporated random test-retest noise for every visit, and scores were rounded to the nearest observable score. Early failure was defined as worsening of two or more levels. We compared mean outcome stereoacuity between treatment groups, with and without the failure rule, using the two-sample t-test and the proportion of erroneous RCTs (significantly different mean outcome values, where truth is known to be no different). Sensitivity analyses were performed to explore the influence of sample size, baseline distribution of stereoacuity, overall magnitude of mean improvement, magnitude of change for the failure rule, and distribution of noise.

Results: A greater proportion of 10,000 simulated RCTs had an erroneous mean difference in outcome with the early failure rule than without (5.49%, 95% confidence interval [CI] = 5.05% to 5.94% vs. 0, 0%, 95% CI = 0% to 0.000001%, difference 5.49%, P < 0.0001). Sensitivity analysis revealed that increased sample size and wider distribution of noise had the greatest influence on increasing proportions of erroneous RCT conclusions.

Conclusions: Study designs incorporating participant-level early failure rules increase the risk of erroneous RCT conclusions and should be avoided.

Translational Relevance: We provide data informing the design of future clinical trials. Earlier failure rules at the participant level should be avoided.

Introduction
When designing randomized clinical trials (RCTs) for strabismus and amblyopia, near stereoacuity is often used as an outcome measure. At an individual participant level, it is appealing to allow early release from a clinical trial when that participant is not improving or worsening. RCT designs that allow early participant release include incorporating early failure criteria that simultaneously end participation and assign final outcome, allowing the participant to go on to possible alternative treatment. 
Test-retest variability occurs with any biological variable, including stereoacuity. Over a longitudinal study design, from one visit to the next, test-retest variability alone may push single observations over a predefined threshold (for example, an early failure criterion). If treatment responses occur at different rates between treatment groups, there would be a higher rate of erroneous “failures” in the treatment group that improved more slowly. If final outcome is assigned at the time of early failure, or at the final visit, whichever is earlier, then the group that is improving slower, at the start of the trial (even if they caught up later), would have more participants assigned erroneously poor outcome values, and therefore the treatment group mean would be erroneously reduced. The purpose of the current simulation study was to explore the magnitude of this effect. 
We hypothesized that early failure criteria would yield a higher risk of concluding erroneous differences between treatments, and therefore a higher proportion of RCTs with erroneous conclusions. 
Methods
To evaluate the effect of an early failure rule on its propensity to generate erroneous RCT conclusions, we performed a simulation study generating 10,000 hypothetical RCTs (as a reasonable compromise between having the largest sample size possible and computing power) with a defined follow-up schedule. We used open-source data (https://public.jaeb.org/pedig/stdy), and simulated data, which did not required approvals from our institutional review board (IRB). 
Underlying Hypothetical RCTs
We created hypothetical RCTs (treatment A vs. B), with 200 subjects in each group. For each hypothetical RCT, the primary outcome was stereoacuity, and the underlying hypothesis was that mean outcome stereoacuity would be different with treatment A versus with treatment B. In our simulations, we tested the hypothesis that there would be a difference between evaluating outcomes with an early failure rule versus without an early failure rule. 
Our simulations defined enrollment near stereoacuity values as 40, 60, 100, 200, and 400 arcseconds(arc sec), in equal proportions, to represent a typical cohort of participants in a strabismus study,1,2 converting stereoacuity values to log10 arcseconds for analysis (i.e. 1.6, 1.8, 2.0, 2.3, and 2.6), over a 7-visit study. 
We chose this specific range of stereoacuity because the near Randot Preschool Stereoacuity Test3 has the following defined levels: 40, 60, 100, 200, 400, and 800 arcseconds. If a subject cannot identify the 800 arcseconds level targets, the stereoacuity is commonly assigned “nil.” In several recent multi-center strabismus studies (e.g. a randomized trial to compare bilateral lateral rectus recession with unilateral resect-recess procedures for basic-type childhood intermittent exotropia),1,2 the inclusion criteria for enrollment was defined as near stereoacuity of 400 arc sec or better, to allow for detection of worsening stereoacuity, and therefore, in the current simulation studies, we set the worst enrollment near stereoacuity as 400 arc sec. 
For our simulations, we defined the profile of mean improvement with treatment A as linear and we defined the profile of improvement with treatment B as initially slower than treatment A at the beginning of treatment, becoming faster than treatment A, such that the mean stereoacuity at the final visit was identical for treatments A and B. We therefore defined the mean starting and ending values with each treatment as identical. 
Because recent RCTs in strabismus and amblyopia often have sample sizes in the range of 100 to 300 per group,1,2 but the final sample size calculations depend on expected means and variances, for specific clinical conditions, we arbitrarily chose a sample size of 200 per group, 400 total, for the primary analyses, with the plan of performing subsequent sensitivity analyses varying the samples size (see the Table). 
Table.
 
Sensitivity Analyses
Table.
 
Sensitivity Analyses
Simulating True Stereoacuity Scores
Based on equal proportions in each of the 5 levels of starting stereoacuity (40, 60, 100, 200, and 400 arc sec), the baseline mean score for both treatment groups was 2.06 log10 arc sec. We then defined the true final mean stereoacuity at the seventh visit as 1.6 log10 arc sec in each group, to represent a condition where we expect the mean stereoacuity to improve over time with treatment, and to be the same mean improvement with each treatment if all subjects were followed to the final seventh visit. We planned sensitivity analysis using smaller magnitudes of mean improvements (see the Table). 
Then, we defined a profile of linear improvement for treatment A where the scores reduced at a constant rate (0.078 log10 arc sec per visit). We defined a different rate of improvement for treatment B, starting slower that treatment A but then increasing over time (0.017 log10 arc sec from baseline [visit 1] to the second visit, 0.043 log10 arc sec after the second visit, 0.058 log10 arc sec after the third visit, 0.079 log10 arc sec after the fourth visit, 0.122 log10 arc sec after the fifth visit, and 0.15 log10 arc sec after the sixth visit), such that the overall mean stereoacuity at the final visit (visit 7) was the same in each treatment group (Fig. 1). 
Figure 1.
 
Each treatment starts with the identical mean stereoacuity score of 2.06 arc sec, and each treatment results in a final mean score of 1.6 arc sec. Treatment A has a constant rate of improvement profile, whereas for treatment B, the improvement profile resembles a Theta decay curve, initially slower and then catching up.
Figure 1.
 
Each treatment starts with the identical mean stereoacuity score of 2.06 arc sec, and each treatment results in a final mean score of 1.6 arc sec. Treatment A has a constant rate of improvement profile, whereas for treatment B, the improvement profile resembles a Theta decay curve, initially slower and then catching up.
Simulating Observed Stereoacuity Scores: Adding Noise
Because stereoacuity is subject to test-retest variability (as any clinical measurement variable), to simulate observed values, we added randomly sampled noise to each true value of stereoacuity. 
Our source of noise was test-retest data extracted from open-source test-retest data from a previous RCT1,2 (data available at https://public.jaeb.org/pedig/stdy) comparing two surgical procedures; bilateral lateral rectus recessions versus unilateral resect-recess, for basic-type intermittent exotropia (354 test-retest pairs; Fig. 2). Using these available data, we created a database representing the distribution of noise and then randomly sampled from that distribution using the frequency weights of the underlying distribution, such that the chance of adding a specific value of noise was in the same proportion as the distribution (e.g. 80% chance of the added noise being zero; see Fig. 2). The profile of noise had a mean of 0.012, standard deviation of 0.16, and was truncated at 0.8 and −0.6). We simulated observed stereoacuity values at each visit based on starting stereoacuity and profile (A or B), adding randomly sampled noise to each value at each visit, and then rounding to the nearest measurable increment of stereoacuity (40 or 60 or 100 or 200 or 400 or 800 arc sec or nil), corresponding to log10 arc sec 1.6, 1.8, 2.0, 2.3, 2.6, 2.9, and 3.2, where 3.2 was assigned to nil (see Fig. 3). 
Figure 2.
 
Noise in stereoacuity measurements extracted from a dataset of test-retest data from an RCT comparing bilateral lateral rectus recessions versus unilateral resect-recess for childhood intermittent exotropia (https://public.jaeb.org/pedig/stdy). Noise is truncated and not normally distributed and is clustered towards the center (mean = 0.012 and SD = 0.16).
Figure 2.
 
Noise in stereoacuity measurements extracted from a dataset of test-retest data from an RCT comparing bilateral lateral rectus recessions versus unilateral resect-recess for childhood intermittent exotropia (https://public.jaeb.org/pedig/stdy). Noise is truncated and not normally distributed and is clustered towards the center (mean = 0.012 and SD = 0.16).
Figure 3.
 
Example profiles of 40 stimulated subjects with baseline score of 60 arc sec (1.8 log10 arc sec) in treatment group A, after adding noise to each value at each visit. This was repeated for 40 subjects for treatment A with baseline scores of 40, 100, 200, and 400. The process was repeated for treatment B, with the different improvement profile. Individual values were each rounded to the nearest measurable increment (40, 60, 100, 200, 400, 800, or nil). This process generated data for a single simulated RCT, and the entire process was repeated for 10,000 simulated RCTs.
Figure 3.
 
Example profiles of 40 stimulated subjects with baseline score of 60 arc sec (1.8 log10 arc sec) in treatment group A, after adding noise to each value at each visit. This was repeated for 40 subjects for treatment A with baseline scores of 40, 100, 200, and 400. The process was repeated for treatment B, with the different improvement profile. Individual values were each rounded to the nearest measurable increment (40, 60, 100, 200, 400, 800, or nil). This process generated data for a single simulated RCT, and the entire process was repeated for 10,000 simulated RCTs.
Defining and Applying the Early Failure Rule
Based on previous test-retest studies using the near Randot Preschool Stereoacuity Test,4 a real change in stereoacuity threshold has been reported to be 2 levels; therefore, real worsening of stereoacuity in longitudinal clinical trials could be considered as reduction of 2 levels, for example, 40 arc sec to 100 arc sec. If a subject has a loss of stereoacuity associated with a specific treatment, many clinicians will want to stop that treatment and start an alternative treatment. This definition could then be used as an “early failure” rule in a clinical trial, such that subjects could stop treatment and receive alternative treatment. In such a case, the final outcome for that particular participant would be defined as the value at the time of failure. Our primary analysis used this definition of a decrease of two or more levels of stereoacuity as the definition of “early failure.” 
In our analyses, when we used the early failure rule, we flagged each case that met early failure criteria and assigned stereoacuity outcome as the value at the time of early failure. 
In our simulated datasets of observed stereoacuity values over 7 visits, we calculated the mean outcome stereoacuity for each treatment: (1) without the early failure rule (stereoacuity at the seventh visit in all cases), and (2) with the early failure rule (some participants outcome stereoacuity at an earlier visit, when they met the early failure criteria), and then we calculated the difference in mean stereoacuity between treatments A and B, represented as a mean and 95% confidence interval (CI). 
Simulating 10,000 RCTs
We then repeated the above simulation of a single RCT for 10,000 RCTs (adding randomly sample noise to each true stereoacuity score, to create different combinations of noise plus scores, as described above) and evaluated each RCT, with and without this early failure rule; worsening of two or more levels from baseline (assigning value at failure as final outcome, if early failure criteria were met). 
Statistical Analysis
Primary Analysis: Frequency of RCTs With an Erroneous Outcome
Because we defined truth as no difference between outcomes of treatments A and B (see Fig. 1), and our primary goal was to determine whether using an early failure rule increased the proportion of RCTs with an erroneous conclusion. For each individual RCT, we evaluated whether there was a difference between treatments A and B by t-test, that is, where the 95% CI of the mean difference of outcome stereoacuity between treatments A and B did not include zero. We then calculated the frequency of RCTs with significant mean difference between treatments A and B. We calculated the proportion of RCTs that would have concluded an erroneous result, declaring a difference in mean outcome stereoacuity, when no true difference existed, defining this concept as “an erroneous RCT.” We evaluated the proportions of erroneous RCTs with and without the early failure rule (with 95% CIs from binomial distribution). We then evaluated the difference in proportions between with and without the early failure rule, using Fisher exact test, providing P value and 95% CI. 
Sensitivity Analysis
To determine if our primary result was sensitive to our underlying definitions and assumptions for the underlying RCTs and values of noise, we performed sensitivity analyses, comparing the frequency of RCTs that had a significant difference between treatments A and B, that is, the proportion of RCTs with an erroneous conclusion. We then compared the proportion of RCTs with erroneous conclusions, with and without an early failure rule, using Fisher exact tests. 
We evaluated the effect of the following parameters: sample size of each RCT (2000 and 4000), distribution of the baseline stereoacuity scores; pseudo normal (proportions of cohort in each enrollment bin of 1.6, 1.8, 2, 2.3, and 2.6 log10 arc sec; 0.1, 0.2, 0.4, 0.2, and 0.1), right skewed (proportions 0.1, 0.4, 0.25, 0.15, and 0.1), left skewed (proportions 0.1, 0.15, 0.25, 0.4, and 0.1), and based on the distribution of enrollment stereoacuity in the previous RCT of unilateral resect-recess versus bilateral lateral rectus recession for intermittent exotropia1,2 (proportions 0.15, 0.25, 0.26, 0.14, and 0.2). We also evaluated the rate and magnitude of stereoacuity improvement (final true mean stereoacuity of 1.9 and 1.7), magnitude of change required to declare early failure (1 level and 3 levels), and distribution of noise (mean of 0.00 and standard deviation of 0.25, and mean of 0.00 and standard deviation of 0.35, both truncated at 0.8 and −0.8 and normally distributed). 
All analyses were conducted in RStudio, version 2023.09.1 Build 494 (RStudio Team; RStudio: Integrated Development Environment for R. RStudio, PBC. 2022. Accessed June 25, 2024, http://www.rstudio.com/). 
Results
Primary Simulation
A greater proportion of 10,000 simulated RCTs had an erroneous mean difference in outcome with the early failure rule than without (5.49%, 95% CI = 5.05% to 5.94% vs. 0, 0%, 95% CI = 0% to 0.000001%, difference 5.49%, P < 0.0001; Fig. 4). 
Figure 4.
 
One example of a simulated RCT showing distributions of outcome stereoacuity between treatments A and B, with and without failure rule, where the RCT conclusion would have differed. Mean difference between treatments A and B without failure rule was 0.011 (95% CI = −0.023 to 0.045) and with failure rule was 0.077 (95% CI 0.023 to 0.13), which is an example of an RCT with an erroneous conclusion, because the known truth is no different between treatments.
Figure 4.
 
One example of a simulated RCT showing distributions of outcome stereoacuity between treatments A and B, with and without failure rule, where the RCT conclusion would have differed. Mean difference between treatments A and B without failure rule was 0.011 (95% CI = −0.023 to 0.045) and with failure rule was 0.077 (95% CI 0.023 to 0.13), which is an example of an RCT with an erroneous conclusion, because the known truth is no different between treatments.
Sensitivity Analyses
When we increased the sample size, the proportion of RCTs with erroneous conclusions increased with the early failure rule, whereas without the early failure rule, the proportions of RCTs with erroneous conclusions remained the same as the primary analysis at 0% (see the Table). We simulated several baseline distributions of stereoacuity values (pseudo normal, right skewed and left skewed, and IXT1 baseline) and the greatest increase in proportions of RCTs with an erroneous conclusion was when the baseline distribution was left skewed (see the Table). When we decreased the magnitude of mean improvement of stereoacuity, the proportions of RCTs with erroneous conclusion decreased markedly when we applied the early failure rule, but no changes were observed when no early failure rule was applied. Defining early failure as smaller change (1 level) increased the proportions of RCTs with erroneous conclusions when the early failure rule was applied and no changes were seen in proportion of RCTs with erroneous conclusions when no failure rule was applied, whereas defining early failure as larger change (3 levels) decreased the proportion of RCTs with erroneous conclusions to 0 the same as when no early failure rule was applied. Using noise with a normal distribution (and a greater standard deviation) led to an increase in proportion of RCTs with erroneous conclusions, both with the early failure rule and without the early failure rule. Increasing the standard deviation further increased the proportion of RCTs with erroneous conclusions, both with the early failure rule and without the early failure rule (see the Table). 
Discussion
Applying an early failure rule for stereoacuity, allowing cessation of RCT participation and release to alternative treatment, resulted in a proportion (5.49%) of simulated RCTs that yielded erroneous conclusions, where a difference between treatment groups would be declared where no true difference existed. In contrast, when the failure rule was not applied, none of the 10,000 simulated RCTs yielded erroneous conclusions. We believe that our results are generalizable beyond stereoacuity, to any RCT that has a continuous outcome measure, prone to test-retest variability, where the rate of improvement may differ between treatment groups, for example, visual acuity. 
On first consideration, it may be surprising that none of the 10,000 simulations without the failure rule yielded erroneous conclusions. It might be expected that there would be 5% chance of an erroneous conclusion, related to a type-1 error. But a type-1 error occurs in the context of a single RCT, where enrolled subjects are sampled from a larger population. Type-1 error is typically defined as 5%, accepting a 5% chance that 2 populations sampled for the same distribution have apparently different distributions. Our paradigm is different. We defined truth as the same final mean stereoacuity, and then we added noise randomly to each value for each of the 200 simulated subjects. In our sensitivity analyses (see the Table), only conditions of much larger magnitudes of noise, and different baseline distributions, yielded non-zero values of erroneous conclusions without the failure rule. These proportions were far less than 5%, because they are definitionally not type-1 errors; they are not based on the errors of sampling the underlying data from a larger population. 
There are some analogies between our analysis of an early failure rule, defining outcome at the time of early failure or at the final visit (whichever is earlier), and the concept of last observation carried forward (LOCF).5 We carried forward stereoacuity from the visit where the early failure threshold was met and assigned it as outcome for that subject. Lachin and colleagues5 have described the reasons why the method of LOCF induces bias. There are parallels between those arguments, and our observations on erroneous conclusions, when early failure is declared, and the outcome variable is carried forward for that participant. 
Considering the results of our sensitivity analyses, we found that specific parameters dramatically increase the proportion of RCTs with erroneous conclusions; increasing noise, increasing sample size, and left skewed and pseudo normal baseline distribution (see the Table). The reason for the effect of increasing sample size is that when we increase the sample size, we narrow the width of the 95% CI on the mean difference between treatment groups and that increases the chances of the CI not including zero, thereby declaring that the RCT is most consistent with a difference between treatments, when no true difference existed (an erroneous RCT conclusion). Baseline distribution of stereoacuity scores also influences the risk of erroneous RCT conclusions when an early failure rule is used. In our primary analysis, with a uniform distribution of baseline scores, or the baseline distribution from a previous RCT comparing two surgical procedures for intermittent exotropia1,2 (approximating a uniform distribution) had a lower proportion of erroneous RCT conclusions than a skewed or pseudo normal distribution. The reasons for this phenomenon are not immediately apparent, but likely result from the interaction of increased proportions of larger values of randomly sampled noise with the baseline values. Decreasing the threshold for declaring early failure (to one level) had the expected effect, making it more likely to assign early failure; the proportion of RCTs with erroneous conclusions increases. Increasing the threshold to three levels, would make it less probable to assign early failure, reducing the proportion of erroneous conclusions to zero. The impact of a higher standard deviation of noise on increasing the proportion of erroneous RCTs, regardless of the application of an early failure rule, is likely due to the increased likelihood of incorporating extreme noise values. These larger noise values can push an individual subjects stereoacuity across the threshold for early failure, thereby increasing the chances of erroneous conclusions. One practical implications of this finding, that magnitude of noise drives the proportion of RCTs with erroneous conclusions, is that precautions should be taken to minimize noise. One example would be using repeat measures of stereoacuity and averaging results. Another example would be to select a method of measuring stereoacuity that had lower magnitudes of test-retest variability, chosen based on existing or preliminary data. 
The weaknesses of our simulation study are common to all simulation studies; we did not use actual patient data. We also did not use actual profiles of mean stereoacuity improvement or worsening, but we used simple hypothetical profiles of stereoacuity improvement, and we applied sensitivity analyses to the initially chosen profiles. We did not explore the influence of the size of increments in the stereoacuity scale, for example, the TNO test, which has smaller increments, because we would need a large dataset of actual test-retest data for that specific test from which to sample noise. Nevertheless, these weaknesses are mitigated by our ability to evaluate the effect of proposed data analysis rules over a very large number (10,000) of hypothetical RCTs. Our underlying assumptions will not apply to all RCTs using stereoacuity as an outcome measure, but we conducted a series of sensitivity analyses to explore the generalizability of our primary finding, and, in nearly all our sensitivity analyses (see the Table), an early failure rule resulted in a greater proportion of erroneous conclusions than without. 
The incorporations of early failure criteria in RCTs with stereoacuity outcomes significantly increases the risk of erroneous RCT conclusions, when the course of improvement differs between treatment groups. Increased sample size and wider distribution of noise had the greatest influence on increasing proportions of erroneous RCT conclusions. We believe that our results are generalizable to any RCT that uses any continuous outcome measure, when that outcome measure is prone to test-retest variability, where the rate of improvement may differ between treatment groups, for example, visual acuity. In general, when planning studies, early failure rules should be avoided. 
Acknowledgments
Supported by the National Eye Institute, Grant EY 011751 (JMH). 
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. 
Meeting Presentation: This work was presented at The Association for Research in Vision and Ophthalmology Annual Meeting 2024 in Seattle, Washington on May 5 to 9, 2024. 
Taxonomy Topics: Stereoacuity, exotropia, esotropia, strabismus, pediatric ophthalmology, randomized clinical trial study design. 
Disclosure: M. Panjwani, None; J.M. Holmes, None 
References
Donahue SP, Chandler DL, Holmes JM, et al. A randomized trial comparing bilateral lateral rectus recession versus unilateral recess and resect for basic-type intermittent exotropia. Ophthalmology. 2019; 126(2): 305–317. [CrossRef] [PubMed]
Donahue SP, Chandler DL, Wu R, et al. Eight-year outcomes of bilateral lateral rectus recessions versus unilateral recession-resection in childhood basic-type intermittent exotropia. Ophthalmology. 2024; 131(1): 98–106. [CrossRef] [PubMed]
Birch E, Williams C, Drover J, et al. Randot preschool stereoacuity test: normative data and validity. J AAPOS Off Publ Am Assoc Pediatr Ophthalmol Strabismus. 2008; 12(1): 23–26.
Fawcett SL, Birch EE. Interobserver test-retest reliability of the randot preschool stereoacuity test. J Am Assoc Pediatr Ophthalmol Strabismus. 2000; 4(6): 354–358. [CrossRef]
Lachin JM. Fallacies of last observation carried forward analyses. Clin Trials. 2016; 13(2): 161–168. [CrossRef] [PubMed]
Figure 1.
 
Each treatment starts with the identical mean stereoacuity score of 2.06 arc sec, and each treatment results in a final mean score of 1.6 arc sec. Treatment A has a constant rate of improvement profile, whereas for treatment B, the improvement profile resembles a Theta decay curve, initially slower and then catching up.
Figure 1.
 
Each treatment starts with the identical mean stereoacuity score of 2.06 arc sec, and each treatment results in a final mean score of 1.6 arc sec. Treatment A has a constant rate of improvement profile, whereas for treatment B, the improvement profile resembles a Theta decay curve, initially slower and then catching up.
Figure 2.
 
Noise in stereoacuity measurements extracted from a dataset of test-retest data from an RCT comparing bilateral lateral rectus recessions versus unilateral resect-recess for childhood intermittent exotropia (https://public.jaeb.org/pedig/stdy). Noise is truncated and not normally distributed and is clustered towards the center (mean = 0.012 and SD = 0.16).
Figure 2.
 
Noise in stereoacuity measurements extracted from a dataset of test-retest data from an RCT comparing bilateral lateral rectus recessions versus unilateral resect-recess for childhood intermittent exotropia (https://public.jaeb.org/pedig/stdy). Noise is truncated and not normally distributed and is clustered towards the center (mean = 0.012 and SD = 0.16).
Figure 3.
 
Example profiles of 40 stimulated subjects with baseline score of 60 arc sec (1.8 log10 arc sec) in treatment group A, after adding noise to each value at each visit. This was repeated for 40 subjects for treatment A with baseline scores of 40, 100, 200, and 400. The process was repeated for treatment B, with the different improvement profile. Individual values were each rounded to the nearest measurable increment (40, 60, 100, 200, 400, 800, or nil). This process generated data for a single simulated RCT, and the entire process was repeated for 10,000 simulated RCTs.
Figure 3.
 
Example profiles of 40 stimulated subjects with baseline score of 60 arc sec (1.8 log10 arc sec) in treatment group A, after adding noise to each value at each visit. This was repeated for 40 subjects for treatment A with baseline scores of 40, 100, 200, and 400. The process was repeated for treatment B, with the different improvement profile. Individual values were each rounded to the nearest measurable increment (40, 60, 100, 200, 400, 800, or nil). This process generated data for a single simulated RCT, and the entire process was repeated for 10,000 simulated RCTs.
Figure 4.
 
One example of a simulated RCT showing distributions of outcome stereoacuity between treatments A and B, with and without failure rule, where the RCT conclusion would have differed. Mean difference between treatments A and B without failure rule was 0.011 (95% CI = −0.023 to 0.045) and with failure rule was 0.077 (95% CI 0.023 to 0.13), which is an example of an RCT with an erroneous conclusion, because the known truth is no different between treatments.
Figure 4.
 
One example of a simulated RCT showing distributions of outcome stereoacuity between treatments A and B, with and without failure rule, where the RCT conclusion would have differed. Mean difference between treatments A and B without failure rule was 0.011 (95% CI = −0.023 to 0.045) and with failure rule was 0.077 (95% CI 0.023 to 0.13), which is an example of an RCT with an erroneous conclusion, because the known truth is no different between treatments.
Table.
 
Sensitivity Analyses
Table.
 
Sensitivity Analyses
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×