Launch Experiment Program

From 0–2 tests per quarter to a running experiment program — in 6 weeks.

The infrastructure, the metrics architecture, the hypothesis backlog, and the Notion OS that keeps it running after the engagement ends.

6 weeks · fixed scope · $4,500–$7,500 · tiered by ARR

AUDIT → METRICS → BACKLOG → NOTION OS

MATURITY AUDIT Scored across 5 dimensions — where the real constraint is
METRIC ARCHITECTURE OEC, secondary metrics, and guardrails agreed before any test runs
SAMPLE SIZE CALC Custom Google Sheets model calibrated to your traffic and baseline
HYPOTHESIS BACKLOG 15–25 scored ideas — 20+ months of runway from day one
NOTION OS Pre-built, pre-populated — team owns it from delivery day

AGREED

There is a primary experiment metric everyone has signed off on before any test runs. The post-test debate — where everyone cites a different metric that supports their pre-existing view — stops happening.

MEASURED

The sample size calculator tells you in two minutes whether a test is worth running. Every test ends in a clean ship/no-ship call because the success criteria were defined before it launched.

RUNNING

Three months from now, the program is still running — because it has infrastructure, not just enthusiasm. The Notion workspace is owned by the team. The HiPPO in the room no longer wins product debates by default.

Six deliverables. Everything needed to build and sustain an experiment program that compounds.

Every item is owned permanently. The Notion OS, the calculator, the playbook — yours to use independently from delivery day.

M
Measure
Maturity Audit — score what's actually broken
E
Establish
Metric Architecture — OEC + guardrails agreed
T
Test
Sample Size Calculator — is it worth running?
R
Research
Hypothesis Backlog — 20+ months of runway
I
Instruct
Experiment Playbook — anyone can run a test
C
Create
Notion OS — program runs after engagement ends
AUDIT · WEEK 1

Experimentation Maturity Audit

Scored assessment across 5 dimensions — analytics quality, hypothesis rigour, statistical practice, test velocity, and team process.

  • The real constraint identified, in order
  • Most teams find the block is pre-metric, not pre-idea
  • Fixing the right thing first is what makes the program work
  • Scored output you can share with leadership
METRICS · WEEK 1

Metric Architecture Document

Defines the primary experiment metric (OEC), 3–5 secondary metrics, and 2–3 guardrail metrics that must not move in the wrong direction.

  • Every test ends in a decision, not a debate
  • Pre-agreed metrics remove the post-test argument
  • "Did this test work?" gets a shared answer
  • One document, signed off before test 1 launches
TOOLS · WEEK 2

Sample Size & Runtime Calculator

A custom Google Sheets model calibrated to your actual baseline conversion rates, traffic volume, and minimum detectable effect.

  • Know whether a test is worth running before you run it
  • "How long do we run this?" goes from guessing to 2 minutes
  • Built from your PostHog or Amplitude baseline data
  • Bayesian variant included for low-traffic situations
BACKLOG · WEEK 3

Hypothesis Backlog + Prioritisation Framework

15–25 scored, properly structured test ideas, plus a repeatable scoring framework for evaluating new ideas as they come in.

  • 20+ months of runway from day one
  • Scoring removes the HiPPO dynamic — evidence wins
  • Each hypothesis structured: prediction, metric, runtime, MDE
  • Framework reusable by anyone on the team
PROCESS · WEEK 4

Experiment Design & Results Playbook

How to spec, run, and call a test — including Bayesian guidance for low-traffic situations where frequentist statistics don't work.

  • Anyone on the growth team can run a test with the same rigour
  • No data scientist required in the room
  • Institutional knowledge that stays when people leave
  • Ship/no-ship criteria defined before every test launches
INFRASTRUCTURE · WEEKS 5–6

Experiment Operating System (Notion Workspace)

A pre-built, pre-populated Notion workspace — experiment log, results library, weekly review agenda, hypothesis backlog, and decision record.

  • Three months from now, the program is still running
  • Institutional and visible — not in one person's head
  • Team owns it from delivery day
  • First 10 hypotheses populated on handover

$5M–$40M ARR teams with 5K+ MAU that want to stop running tests and start running a program.

GROWTH PM

Just hired to build the experiment program

$5M–$20M ARR · 5K–30K MAU

You've been given a remit to build an experiment program — and you're starting from zero. There's no agreed primary metric, no sample size discipline, and the backlog is whatever got mentioned in the last planning session. This engagement gives you the foundation before you inherit the chaos.

  • Maturity audit showing leadership exactly what's broken
  • Metric architecture everyone has signed off on
  • Notion OS that makes the program visible and defensible

You walk into Q2 planning with a running program, not a promise to build one.

VP PRODUCT

Experiments are on the roadmap. Nobody knows how to run them.

$10M–$40M ARR · New programme

Your team runs 1–2 tests per quarter when someone has a strong enough opinion. Tests get called early when the early data looks good. The HiPPO in the room still wins most product debates because there's no agreed process for running the test that would settle it.

  • Agreed OEC and guardrail metrics — no more post-test debate
  • Sample size calculator that tells you in 2 minutes if a test is worth running
  • Hypothesis backlog with 20+ months of runway

The experiment program runs on process, not on whoever pushes hardest.

ANY ROLE

Experiments keep dying. The HiPPO problem is real.

$5M–$40M ARR · Post-Series A

Tests get championed by whoever has the most influence that week. Results get argued about instead of acted on. Six months later there are 8 tests in a spreadsheet and nobody can tell you what the program actually learned. The playbook and scoring framework change the dynamic — the best evidence wins, regardless of seniority.

  • Prioritisation framework that scores ideas by evidence, not by who suggested them
  • Experiment Design Playbook that removes ambiguity from every test call
  • Notion OS that makes every decision visible and traceable

Product debates get settled by data — not by the most senior person in the room.

Foundation first, then backlog, then the OS that keeps it all running.

Week 1
Maturity audit + metric architecture document. Scored diagnostic across 5 dimensions. OEC, secondaries, and guardrails agreed.
Week 2
Sample size calculator built and calibrated. Custom Google Sheets model built against your baseline data. Runtime question answered permanently.
Week 3
Hypothesis backlog + prioritisation framework. 15–25 scored ideas structured with prediction, metric, MDE, and runtime. Scoring framework documented.
Week 4
Experiment design and results playbook. Spec-to-ship-call process documented. Bayesian guidance for low-traffic scenarios included.
Week 5
Notion workspace built and populated. Experiment log, results library, weekly review agenda, hypothesis backlog, decision record. First 10 ideas populated.
Week 6
Team walkthrough + live experiment review. First test spec reviewed together. Team owns the program from this point.

Four steps from first call to a running program.

01

30-minute call

We review your current analytics setup, test velocity, and team structure. You leave knowing whether the traffic and infrastructure are there for this engagement to work — and what the scope looks like. No pitch. No deck.

02

2-page proposal

Specific scope: deliverable list, timeline, pricing tier (based on ARR), and what we need from your team. If the traffic minimums aren't there, we'll say so before you commit. Nothing ambiguous.

03

The 6-week engagement

Audit and metric architecture in week 1. Calculator in week 2. Backlog in week 3. Playbook in week 4. Notion OS built and populated in week 5. Team walkthrough in week 6. Weekly check-in at each phase gate.

04

Full handover

All 6 deliverables delivered. Notion workspace live and owned by your team. Calculator and playbook documented for independent use. You run the program — no ongoing dependency on us.

What this would cost to build separately.

Standalone market rates for each component.

Experimentation audit (consultant day rate)~$2,500
Metric architecture + OEC definition~$1,500
Sample size calculator build + calibration~$1,000
Hypothesis backlog (15–25 structured ideas)~$2,000
Experiment design playbook~$1,500
Notion OS build + population~$1,500
Standalone total~$10,000
Launch Experiment Program$4,500–$7,500

Fixed scope. Tiered by ARR.

$4,500–$7,500

One-time. Tiered: $4,500 for $5M–$15M ARR · $6,500–$7,500 for $15M–$40M ARR

  • Experimentation Maturity Audit
  • Metric Architecture Document (OEC + guardrails)
  • Sample Size & Runtime Calculator
  • Hypothesis Backlog + Prioritisation Framework (15–25 ideas)
  • Experiment Design & Results Playbook
  • Notion Experiment OS — pre-built, pre-populated
Book a 30-minute call

Traffic minimum: ~5,000 MAU in-product or 15,000+ monthly web visitors. If the numbers aren't there, we'll confirm before you commit.

Jake McMahon, founder of ProductQuant

Jake McMahon · Founder, ProductQuant

Jake McMahon

8+ years building growth systems inside B2B SaaS · Bachelor's in Behavioural Psychology · Master's in Big Data

Eight years as a product leader inside B2B SaaS companies — product manager, growth lead, head of product, from seed-stage to $80M ARR. He kept watching smart teams make the same mistake: good tools, real talent, no system connecting any of it.

Experimentation work built on real PostHog, Amplitude, and Mixpanel data — not CRO playbooks designed for e-commerce traffic volumes. ProductQuant is what he'd hire if he were still an operator. There's no team of junior analysts.

What he won't do:

  • Promise revenue numbers he can't verify
  • Hand you a strategy deck and disappear
  • Recommend work you don't need
  • Build something that only works if you keep paying him

"Could our growth PM build this program themselves?"

Possibly — over 6–9 months. The METRIC System takes 6 weeks because the deliverables are built in parallel with dedicated focus: the maturity audit informs the metric architecture, which calibrates the sample size model, which shapes the hypothesis scoring. A growth PM building this alongside a live roadmap will stall at the metric architecture step — agreeing on the OEC takes longer than people expect when it has to go through leadership sign-off. This engagement is designed to get past that bottleneck in week 1.

Teams Jake has worked with

monday.com
Payoneer
thirdweb
Guardio
Gainify
Canary Mail

Frequently asked.

What's the traffic minimum and why?

Around 5,000 MAU in-product or 15,000+ monthly web visitors. Below that, sample sizes become so large that tests take 3–6 months to reach significance — and a program that takes 6 months per test doesn't compound. We'll confirm your baseline in the first call, and if it's not there, we'll say so before you commit.

How is this different from a CRO agency?

Most CRO consultants come from a website and landing page background — where traffic is high and tests run fast. This engagement is built for in-product B2B SaaS experimentation: lower traffic, longer conversion cycles, PostHog and Amplitude instead of GA4 and Optimizely. The sample size model is calibrated from actual product analytics data, including HogQL where applicable. Speero, for comparison, requires 100,000+ monthly visitors and prices at $10K+/month. This sprint targets the gap they explicitly can't serve.

What if we already run some experiments?

The maturity audit exists for exactly this situation. It scores your current practice across 5 dimensions and identifies where the real constraint is — which is usually one layer upstream from where teams think it is. The audit doesn't assume you're starting from zero; it maps what you have and shows you what to fix first.

Will the program actually keep running after the engagement ends?

That's the design constraint the Notion OS is built around. Experiment programs die when they live in one person's head or one spreadsheet that nobody updates. The Notion workspace is institutional and owned by your team from day one — experiment log, results library, hypothesis backlog, weekly review agenda, decision record. The playbook means anyone on the growth team can run a test with the same rigour, without needing the person who built the program in the room.

Is this positioned as a standalone offer or a follow-on?

Both work, but it pairs naturally as Phase 2 of the PLG Sprint or AI Feature Launch. Teams that just instrumented their product now have the data to run experiments on it. Standalone cold outreach is also viable for growth PMs and VPs Product specifically hired to build an experiment program — the brief is clear and the pain is acute.

What do you need from our team?

Access to your analytics platform (PostHog, Amplitude, or Mixpanel) for the maturity audit and sample size calibration. One 60-minute working session in week 1 for the metric architecture sign-off. Feedback on the hypothesis backlog in week 3 (one async review). The week 6 walkthrough is 90 minutes. Total time commitment from your team: around 4–5 hours across 6 weeks.

Ready to stop running tests and start running a program?

30 minutes. You'll leave knowing if this engagement fits — and what the scope looks like.

Book a 30-minute call