How We Verify Our Work

Evidence first. Clear sourcing. No hype.

Health information is everywhere. Most of it is oversimplified, commercially motivated, or built on a selective reading of the evidence. This Body exists to reduce that confusion, not add to it. We do not publish guidance unless it has been checked against the research, clearly sourced, and explained in context.

Verification is not a vibe or a brand claim. It is a repeatable process, one that can be audited, audited, and challenged. This page explains how we frame questions, select evidence, assess quality, sit with uncertainty, and turn research into practical guidance for women in midlife without overstating what the science actually shows.

Why verification matters

Midlife women are underrepresented in research. A lot of “health truths” come from studies on young men or mixed populations where sex and life stage are never properly analysed. We make those gaps visible rather than pretending the evidence is stronger than it actually is for midlife women.
Single studies mislead more than they inform. One trial can be a fluke, distorted by its methods, or unrealistically optimistic because of a small sample and a short timeline. We look for patterns across the broader body of evidence, not a headline result from one paper.
Commercial incentives warp the message. Much of what circulates in the wellness space is optimised for clicks, affiliate income, or supplement sales. Our process is designed to keep conclusions anchored to evidence quality and relevance, not to what performs well online.
Context determines whether a result applies to you. A finding from sedentary adults may not transfer to trained women. Results shift with dose, baseline symptoms, training history, or what medications someone is taking. We state the “who and when” so you can judge applicability yourself.

Our research process

Step 1. Define the question clearly

Population: We are specific about who the evidence is meant to apply to: perimenopause, menopause, postmenopause, or a mixed group. We also note key modifiers like training status, BMI, endocrine therapy, or baseline symptom severity when these change how results should be interpreted.
Outcome: We pin down what “works” actually means in measurable terms. Hot flush frequency, lean mass, sleep latency, HbA1c. If outcomes are vague or not clinically meaningful, we treat those conclusions as weaker and say why.
Context: Intervention details matter. Dose, duration, delivery method, adherence, and what something was paired with can all change the result. If a protocol only works under lab conditions that nobody could realistically replicate, we flag that clearly.

Step 2. Gather the best available evidence

Clinical guidelines: Where good guidelines exist, we start there. They synthesise large bodies of evidence and weigh benefits against harms in a way no single study can. We still check recency, methodological transparency, and whether the guidance actually applies to midlife women.
Systematic reviews and meta analyses: We favour reviews with clear inclusion criteria, proper bias assessment, and appropriate methods. Heterogeneity matters. So does whether pooled effects are clinically meaningful. Statistical significance alone is not enough.
Randomised controlled trials (RCTs): RCTs reduce confounding, but execution quality still varies enormously. We look at randomisation, blinding, what the comparator was, adherence rates, and whether outcomes were preregistered. Selective reporting is common and consequential.
Observational studies: Useful for identifying real world trends and associations. But they cannot prove causation, and we do not treat them as though they can. We use them for context, particularly where RCTs are limited, and we label them explicitly as lower certainty.
Mechanistic research: Lab and animal findings can explain why something might work. They rarely tell us whether it actually does. We use mechanistic evidence to clarify plausibility, not to support confident “it works” conclusions.

Step 3. Assess quality and bias

Study design and execution: Sample size, duration, dropout, adherence, and whether the methods genuinely test the claim. A short trial is not automatically useless, but we will not treat it as decisive for long term outcomes or real world sustainability.
Clinical relevance: Effect sizes need to mean something in real life. A statistically significant change that is too small to feel or measure functionally may not be worth acting on. We also note when benefits come with trade offs: side effects, poor tolerability, high dropout.
Conflicts of interest and spin: We check funding sources and author disclosures. We also read for framing. Phrases like “trend toward significance” or heavy emphasis on secondary outcomes are common ways results get inflated. If bias risk is elevated, our confidence rating drops.
Female inclusion and subgroup relevance: We track whether women were included and whether results were broken down by sex or life stage. Where evidence relies heavily on male or mixed samples without subgroup analysis, we mark that limitation rather than glossing over it.
Replication across research groups: Findings from multiple independent labs carry more weight than findings from a single group, however prolific. If most of the evidence on a topic comes from one network, we treat the conclusions as more fragile and say so.

Step 4. Synthesise and translate (without overclaiming)

We look at the whole picture, not just the exciting parts. Conclusions are not built from one compelling study. We assess consistency, magnitude, quality, and relevance across the full body of evidence. Where results are mixed, we try to identify what actually varies: population, protocol, outcome, and why.
We separate outcomes, not just topics. A topic can have strong evidence for one outcome and weak evidence for another. Strength gains versus body fat loss, for example, often sit very differently. Where evidence differs by outcome, we present that explicitly.
We give practical guidance with guardrails. When evidence supports an action, we translate it into clear steps: dose ranges, timing, pairing factors, realistic expectations. We also include caveats for who should be cautious or speak to a clinician before acting.

How we grade evidence

Strength ratings describe how confident we are that a claim holds true for the stated population and outcome, given what the current evidence actually shows. This is not a guarantee of personal results. It is an honest, auditable confidence label.

Strong: Multiple high quality human studies point in the same direction, with meaningful effect sizes, often backed by systematic reviews or several well run RCTs. We are confident the direction of effect is real. Individual responses will always vary.
Moderate: Human evidence suggests a genuine effect and results are reasonably consistent, but there are real limitations: short duration, varied protocols, mixed populations, or modest effect sizes. It is probably true, but how much it applies to you depends on your context and how closely you follow the protocol.
Mixed: Results are inconsistent across studies, or depend heavily on subgroup, protocol, or how outcomes were defined. Mixed does not mean useless. It means we cannot generalise confidently. We explain what seems to work, for whom, and what remains genuinely uncertain.
Limited: Some human evidence exists, but it is small, methodologically weaker, poorly matched to midlife women, or too inconsistent to draw a firm conclusion. Often this is “promising but uncertain.” We keep practical guidance conservative and expectations grounded.
Emerging: Early signals only. Evidence may be mechanistic, observational, or from small studies not yet replicated by independent groups. We might explain why the idea is plausible, but we do not treat it as a reliable basis for decisions. The language stays explicitly preliminary.

A visual example

Below is an example of a Verification Card. It shows the claim being evaluated, the confidence rating, how many studies were reviewed, female inclusion, relevant life stage tags, key caveats, practical application, and a link to the full evidence summary.

Creatine can improve strength and power in midlife women when combined with resistance training.

Strength: moderate

EmergingLimitedMixedModerateStrong

Likely true, but effect size and applicability may vary by protocol, population, and study duration.

18 studiesFemale inclusion: some

perimenopausemenopause

What was measured

Strength & Power

Improves• Most consistent outcome in trials.

moderate

Body Composition

Mixed / depends• Protocol and baseline dependent.

limited

Safety & Tolerability

Improves• Typical doses, standard cautions apply.

strong

Most studies are short duration. Long term adherence matters.
Effects vary with training status and dosage.

Consider 3 to 5 g per day creatine monohydrate.
Pair with progressive resistance training 2 to 4 times per week.

Last reviewed 2026-02-13• This Body Verification Team

How to read a Verification Card

Every Insight includes a Verification Card so you can see at a glance: what the claim is, how confident we are, what was actually measured, and where to find the reasoning behind the rating. It is designed to be readable in seconds but defensible under scrutiny.

Primary conclusion: The exact claim we are evaluating, written in plain language. Deliberately narrow. We avoid vague statements so readers can tell precisely what is supported and what is not. No decoding of marketing wording required.
Strength bar: A visual confidence indicator based on our strength rating, from Strong to Emerging. The bar reflects certainty, not goodness. A Moderate rating can still be genuinely useful. Emerging simply means the evidence is too early for confident conclusions.
Study count: The number of human studies reviewed to support the claim. More studies does not automatically mean stronger evidence, but consistency across a larger pool does improve confidence. We update counts as new evidence comes in.
Female inclusion: A transparency flag on how well women were represented in the evidence base. High means meaningfully included. Some means partial. Low means the research leans heavily on male or mixed samples without subgroup analysis. Unknown means the reporting was unclear.
Stage relevance: Labels showing whether the evidence applies to perimenopause, menopause, postmenopause, or all stages. These come from the actual populations studied, not assumptions. Where evidence is inconsistent across stages, we say so.
Key caveats: The limitations most likely to change how you interpret a result: short duration, small sample sizes, outcome definition problems, protocol variation, adherence issues. Caveats are there so readers understand where uncertainty comes from, not just that it exists.
Application: Practical guidance that follows from the evidence, expressed conservatively and with context. We include realistic expectations, relevant time horizons, and the pairing factors that often determine whether something actually works outside a research setting.
Evidence summary link: A direct route to the underlying reasoning: what we included, how we judged quality, what outcomes we measured, and how we arrived at the rating. Where possible, citations are included so readers can follow the trail themselves.

How we handle uncertainty and mixed evidence

We name uncertainty rather than smooth over it. If evidence is conflicting or thin, we do not average it into false confidence. We explain what is known, what is genuinely unclear, and what would need to change for conclusions to strengthen. That is a non-negotiable part of how this platform operates.
We identify what is actually mixed: population, protocol, or outcome. Mixed should never be a dead end. We look for the driver of disagreement: dose, duration, training status, how symptoms were measured, study quality. That specificity is what makes the rating useful rather than just a shrug.
Guidance stays proportionate to certainty. The thinner the evidence, the more conservative the recommendation. For Emerging or Limited ratings, we focus on low risk options, honest expectations, and watch and learn framing, not confident prescriptions.

Updates, corrections, and transparency

Review cadence: Verified topics are revisited on a rolling schedule, typically every 6 to 12 months, depending on how fast the evidence is moving. Topics where new trials or guidelines are landing frequently get reviewed sooner.
Material updates: When high quality new evidence changes the direction or confidence of a claim, we update the rating and rewrite the explanation. We do not quietly shift our position without making it clear that the understanding has changed and why.
Corrections: If something we publish turns out to be incorrect, poorly supported, or misleading, we correct it. That might mean changing wording, revising a strength rating, removing a weak citation, or clarifying a limitation we understated. Accuracy matters more than looking consistent.
Last reviewed date: Every Verification Card carries a “Last reviewed” date. Science changes. An honest platform shows when a conclusion was last checked, rather than implying all content is permanently current.

How we use AI tools

AI assists the workflow. It does not determine the conclusions. We may use AI to outline, organise notes, or draft plain language explanations. But AI does not decide what is true, how evidence is weighted, or what strength rating is assigned. That is human work and human accountability.
Abstract level summaries are not enough. AI tools tend to over rely on abstracts, which often omit key limitations or frame results more positively than the full paper warrants. We do not treat abstract summaries as sufficient for verification. Ratings are based on a careful reading of the full evidence.
Traceability is not optional. Wherever possible, we link to the evidence trail so readers can verify claims themselves. If a claim cannot be traced back to credible sources, it does not belong on a platform built around verification.

Our commitment

This Body is not built for virality. It is built for clarity. If we cannot justify a claim with credible evidence and honest context, we will not publish it. Evidence first. Clear sourcing. No hype.