Reading a Peptide Trial Paper: What Matters and What's Decoration

Most peptide claims trace back to one or two trial papers. The difference between someone who reads "BPC-157 helps tendons" and someone who reads the actual rat-tendon study and then asks "would this translate?" is mostly trial-paper literacy, not subject-matter expertise. This article is the practical version of "how to read a paper" - what to look at first, what to skip, and the red flags that signal a paper is doing post-hoc reframing rather than reporting what the trial actually showed.

For the broader evidence-tier framing, see Peptides 101; for the per-class bloodwork-monitoring frame the trials inform, see Bloodwork for Peptide Users.

Read in this order

Title and abstract. 30 seconds. You're triaging whether to spend more time on the paper. Note the registered primary endpoint, the population, and the broad direction of the result.
Methods. 5 minutes. The most important section. Population, intervention, control arm, primary outcome, sample size. If the methods don't say what was registered as the primary outcome versus what's being reported, the paper has a credibility problem before you read the results.
Primary outcome result. Read this in the results section before the discussion section. The discussion will frame everything; the raw primary-outcome number is what the trial actually measured.
Effect size and confidence interval. Bigger deal than the p-value. A statistically significant effect of 0.3% weight loss (with n=4000) is meaningful for a regulator and not meaningful for a person.
Dropout / completion rates. Trials with high dropout overrepresent compliant responders. A "60% reduction in X" that came from 40% completion is mostly survivor bias.
Conflict-of-interest section. Don't auto-disqualify sponsored trials, but read knowing the framing pressure.
Discussion. Read last. This is where authors contextualise (or oversell) what they found.

What the abstract hides

The primary-vs-secondary endpoint switch. Modern trials register their primary outcome publicly (ClinicalTrials.gov) before the trial starts. If the abstract emphasises a secondary outcome that "showed an effect" while the primary missed, that's a red flag. Read the registered protocol; cross-check against what's being reported.
Subgroup analysis as the headline. "Among women over 40 with baseline BMI >35, weight loss was 12%." Subgroups are exploratory; positive results in subgroups when the overall population didn't show the effect are not evidence of efficacy. They're hypothesis-generating.
Per-protocol vs intention-to-treat. ITT (everyone randomised) is the conservative analysis. PP (only completers) inflates effect sizes. Modern papers report both; the difference between them tells you about dropout-driven distortion.
"Statistically significant" without the magnitude. p < 0.05 means the effect is unlikely to be zero. It says nothing about whether the effect is large enough to matter.

Effect sizes and confidence intervals (what to actually read)

Most peptide-trial papers report effect sizes as either an absolute difference (e.g., 14.9% weight loss vs 2.4% placebo for tirzepatide SURMOUNT-1) or a relative difference (40% reduction in cardiovascular events). Two questions to ask:

Is the absolute number meaningful? 12.4% weight difference at 72 weeks is meaningful. 0.3% reduction in HbA1c is measurable; meaningful is debatable.
What's the confidence interval? 14.9% (95% CI 13.7–16.0) is a tight estimate. 14.9% (95% CI 4–25) means the trial couldn't precisely pin down the effect size - could be 4%, could be 25%. Wide intervals on small trials are normal; treating the midpoint as the truth misreads the data.
Replication tightens the interval. SURMOUNT-1's 14.9% finding is supported by SURMOUNT-2 (a similar magnitude in a different population). If a single trial has wide CIs but replication studies cluster around the same number, the effect is real even if the original CIs looked permissive.

Worked example 1: SURMOUNT-1 (tirzepatide for obesity)

What the abstract said: Tirzepatide produced substantial weight reductions vs placebo in adults with obesity.
What to read in the methods: Population - adults with BMI ≥30 (or ≥27 with weight-related comorbidity), 72 weeks. Primary outcome - change in body weight from baseline. Three tirzepatide arms (5 mg, 10 mg, 15 mg) vs placebo.
The primary result: mean weight change −15.0% (5 mg), −19.5% (10 mg), −20.9% (15 mg) vs −3.1% (placebo). All p < 0.001. Confidence intervals tight (3000+ patients).
What this tells you: the result is robust, the effect size is large enough to matter clinically, and there's a dose-response. This is high-quality evidence.
What to still flag: 72-week duration. Long-term durability beyond 72 weeks isn't in this paper - discontinuation behaviour comes from STEP-4 and the SURMOUNT-1 extension, which are separate publications.

Worked example 2: STEP-1 extension (semaglutide discontinuation)

The setup: STEP-1 randomised participants to semaglutide vs placebo for 68 weeks. The extension followed a subset after they stopped both drug and lifestyle intervention.
The primary finding: by week 120 (one year off drug), participants had regained ~two-thirds of the weight they'd lost on semaglutide. Cardiometabolic improvements (lipids, blood pressure) reverted on similar timelines.
What to read carefully: the extension was a smaller subset of the original trial - selection bias matters (people who agreed to the extension may differ from those who didn't). The comparison is regain-from-end-of-drug, not vs control, so this is observational once the drug stops.
Useful claim: "GLP-1 weight loss is largely reversible without maintenance dosing." That's what the data actually shows.
Misuse: "GLP-1s don't work" - wrong; the on-cycle weight loss is real and large. The discontinuation pattern is about maintenance, not efficacy.

Worked example 3: Stier 2013 (AOD-9604 null result)

The trial: Phase 2b RCT of AOD-9604 vs placebo over 12 weeks for obesity. Multiple dose arms.
The result: no statistically significant weight-loss difference vs placebo at any dose. Safety was clean.
What this means and doesn't mean: the trial didn't show a clinically meaningful obesity-treatment effect at the studied doses and duration. It doesn't mean AOD-9604 has zero biological activity (animal data shows lipolytic signal) - it means the human-obesity efficacy hypothesis didn't replicate in a well-designed trial.
How the catalogue handles this: AOD-9604 is in the catalogue with explicit "didn't beat placebo in Phase 2" framing. The peptide is still useful in the bodybuilding-fasted- state context where the trial's conditions don't apply, but the "FDA-pathway abandoned, not approved" status is the honest framing.

Red flags worth pausing on

The primary endpoint changed mid-trial. Pre- registered ClinicalTrials.gov entry says the primary outcome is X; the published paper reports Y as primary. Sometimes legitimate (rare events made X impractical), often suspicious. Cross-check the registry.
Short follow-up + chronic-condition framing. "12-week trial shows reduction in cardiovascular events" - events of that kind don't accumulate enough in 12 weeks to power a meaningful comparison. The trial is measuring something else and extrapolating.
Composite endpoints with one driver. "Composite of MI, stroke, cardiovascular death" reduction of 20%. Read which component drove it; if 90% of the effect is on a soft secondary endpoint and the hard endpoints didn't move, that's not what the composite framing implies.
Per-protocol-only analysis. If the paper only reports completers without an ITT comparison, dropouts are doing work. ITT is conservative and harder to game.
"Trends toward significance" language is usually cope. If the result was significant, it would be stated as significant. "Trends toward" is "missed but we want to talk about it anyway."
Industry-funded with all-positive results across multiple secondary endpoints. Real biology is variable. A paper where every secondary endpoint moved in the favourable direction suggests selective reporting unless the trial is huge.

Where to find papers

PubMed (pubmed.ncbi.nlm.nih.gov) indexes most biomedical literature. Free for everyone. The abstract is always free; the full text varies by paper.
PubMed Central (PMC) hosts free full-text versions of papers from authors who deposited them. Filter PubMed results to "Free full text" to limit to PMC.
ClinicalTrials.gov for trial registry entries (US-registered trials) - the pre-registered protocol for any modern trial worth taking seriously. EU trials register at EudraCT / CTIS.
Sci-Hub (legally grey) for paywalled papers. Many academic readers use it routinely; whether it's appropriate is a personal call.
The author's institutional page or ResearchGate. Researchers often post PDFs of their own papers. Searching the title plus author name plus "PDF" frequently produces a free copy.
Email the corresponding author. Many will send a PDF on request. Surprisingly underused.

What stops people

Reading only the abstract. The abstract is the paper's marketing copy. The methods section is where the trial actually lives. If you're going to read one section thoroughly, read methods.
Conflating statistical significance with clinical significance. p < 0.05 means the effect probably isn't zero. It says nothing about whether the effect is large enough to care about. The effect size is what matters; the p-value is the threshold for "the effect probably exists."
Treating subgroup-positive findings as primary evidence. Subgroup analysis is exploratory by design. Promoting "tirzepatide worked best in patients with HbA1c >9.0 at baseline" as a recommendation when the overall population response was different is misuse.
Skipping the dropout discussion. A 40% dropout rate in a treatment arm is the trial's biggest finding and most papers bury it in supplementary tables. Find that number; it shapes everything else.
Reading review articles instead of primary papers. Reviews are useful for orientation but they're someone else's interpretation of the primary literature. For the specific questions you care about, the primary paper is the source of truth. The review tells you which primary papers exist; the primary papers tell you what they actually showed.

Cross-references

Peptides 101 - the evidence-tier framing this article operationalises.
Semaglutide vs Tirzepatide vs Retatrutide - pulls the SURMOUNT and STEP trial-effect-size numbers into the comparison framing.
Why Retatrutide Vanished - case study of a different kind of trial-paper reading: when a phase-2 paper produces a number that the company later moves away from for non-trial reasons.
Sourcing and Verification - adjacent literacy: reading vendor / lab claims with the same scepticism applied to trial papers.

Sources

Jastreboff et al. SURMOUNT-1 (NEJM 2022) - https://www.nejm.org/doi/full/10.1056/NEJMoa2206038
Wilding et al. STEP-1 (NEJM 2021) - https://www.nejm.org/doi/full/10.1056/NEJMoa2032183
Stier et al. AOD-9604 phase 2b - https://pubmed.ncbi.nlm.nih.gov/23741561/
Oxford Centre for Evidence-Based Medicine, levels-of-evidence framework - https://www.cebm.ox.ac.uk/resources/levels-of-evidence