Outcomes are regularly compared across studies frequently leading to inaccurate conclusions due to sampling differences. We introduce distribution-adjusting methods allowing for more adequate cross-study comparisons. To illustrate these methods, we use time-in-range 70-180mg/dl (TIR) from two closed loop system pivotal clinical trials: CLS1 for MiniMed 670G (Medtronic), and CLS2 for Control-IQ (Tandem). TIR was the key outcome measure for both, but baseline TIRs were different (Table 1). Furthermore, CLS1 did not have a control group; rendering the otherwise best option, a comparison of the experimental vs. control deltas, inapplicable. A second-best option is to compare the increments from baseline to active-treatment TIRs, which causes two opposing statistical artifacts: (1) with higher baseline, active-treatment TIR would be higher, and (2) with higher baseline, the TIR increment would be lower due to a ceiling effect. Three methods were used to mitigate these artifacts by matching baseline TIRs distributions: sample truncation, weighted resampling, and baseline cumulative distribution functions mapping. All three resulted in similar Adjusted TIRs (Table 1). Comparing results across studies is possible, if limited to the relative improvements from baseline to active treatment. However, comparing absolute results will not be accurate without proper baseline adjustments.