LAUNCH EXPERIMENT PROGRAM — A running program, not just a test

Jake McMahon Jake McMahon · ProductQuant

A working experiment program — one test live, the infrastructure built, and a backlog of 20 more scored and ready.

An engagement that takes your team from 0–2 tests per quarter to a running experiment program — with the metrics architecture, infrastructure, hypothesis backlog, and Notion OS that keeps it running after the engagement ends.

Running experiment program with at least one test live at the end

WHAT YOU HAVE AT THE END

Metrics hierarchy North Star + drivers + guardrails agreed before any test runs
Instrumentation audit Analytics quality scored and gaps documented
5 experiment designs First tests designed, specced, and ready to run
Notion OS Pre-built, pre-populated — team owns it from handover
20+ hypothesis backlog Scored and structured — 6-month experiment calendar

Fixed price · 6-week engagement

We build a system that runs your product tests.

You get a live experiment, the tools to manage it, and a list of 20 more tests to run next.

PRODUCT MANAGER

"Will customers buy more if we add this feature?"

We build a test that shows the new feature to half your visitors. You see if sales go up. This tells you if the feature is worth building.

MARKETING TEAM

"Which headline gets more people to sign up?"

We show two different headlines on your website to different visitors. You see which one gets more emails. Now you know the best message to use.

DESIGNER

"Is the green button or the blue button better?"

We set up a test where half the users see one button, half see the other. You see which color gets more clicks. You can stop guessing about design.

WEEKLY CHECK-IN

"What should we test next?"

We give you a scored list of 20 new test ideas. You pick the best one and we build it. You always know what to try without running out of ideas.

Teams Jake has worked with

Gainify
Guardio
monday.com
Payoneer
thirdweb
Canary Mail
CircleUp

8+ years in B2B SaaS product & growth · Behavioural Psychology + Big Data (Masters)

What happens when experimentation runs on enthusiasm instead of infrastructure

You want to be data-driven. You’re running no experiments.

The roadmap is driven by the loudest voice in the room, the most recent customer complaint, or the founder’s intuition. Everyone agrees that experiments would help. Nobody has built the infrastructure to run them systematically. The gap between “wanting to experiment” and “running experiments” stays open for quarters.

“We keep saying we’ll be more data-driven this quarter. We keep making product decisions the same way we always have.”

Experiments run. Results are argued about. Nobody ships anything.

Tests get called early when the early data looks promising. Results get challenged because the primary metric was never agreed upfront. Post-test debate — where everyone cites a different metric that supports their pre-existing view — absorbs more time than the test itself. The experiment program dies because it produces controversy, not decisions.

“We ran the test. It ‘won’ on the metric we tracked. Engineering shipped it. Retention dropped. Nobody can explain why.”

PMs want to experiment. They don’t know where to start and don’t have a data scientist in the room.

The team has the instinct to test but not the framework. What is the right sample size? How long should the test run? How do we score hypotheses against each other? What happens if the test reaches significance early? Every experiment starts from scratch because there is no repeatable process — and the answers require a data scientist who is not in the room.

“We have good ideas. We just can’t run them rigorously enough for the results to mean anything.”

The experiment program was built. It died when one person left.

A previous growth PM set up a testing process. It lived in their head and one spreadsheet. They left. The spreadsheet stopped being updated. Within a quarter, the team was back to shipping features based on intuition, and nobody could find the experiment log. Programs that live in one person’s head are not programs — they are single points of failure.

“We had a testing process. Then our growth lead left. Now we have a spreadsheet nobody updates.”

WHY THIS IS DIFFERENT

Experiment programs die when they are built on enthusiasm, not infrastructure.

An experiment program often stalls because the primary metric was never agreed, the hypothesis scoring was informal, and the process lived in one person’s head rather than a system the whole team can use.

This engagement builds the infrastructure that keeps the program running after it ends. The metrics hierarchy is agreed in writing before any test launches. The sample size calculator removes the “how long do we run this?” question permanently. The Notion OS is institutional — owned by the team, not by the person who built it. The hypothesis backlog gives 20+ months of scored, structured test ideas, so the program is never stalled for lack of something to test.

At the end, the program is running. One test is live. The team knows how to spec the next one without Jake in the room. The HiPPO no longer wins product debates by default — because there is a process for running the test that would settle it.

WHAT YOU GET

18 deliverables that take you from no program to one live experiment and a scored backlog.

Deliverable 01
Metrics Hierarchy Development Workshop

Your team leaves with a shared, agreed definition of the one metric that matters most for growth and the handful of driver metrics that reliably move it — ending the internal debate about what you're actually optimising for.

Deliverable 02
Instrumentation Audit Across 5 Dimensions

Every critical measurement gap in your current analytics setup is identified and documented, so you stop making product decisions on incomplete or misleading data.

Deliverable 03
Hypothesis Backlog Development (Typically 20+ Ideas)

Between 20 and 30 structured experiment hypotheses, each tied to a real metric and a clear business rationale — enough to keep your team running meaningful tests for months without starting from a blank page.

Deliverable 04
Traffic Volume and Baseline Analysis

Before your first experiment runs, you know exactly how long tests need to run to reach statistical significance — so you stop calling tests early and making decisions on noise.

Deliverable 05
Team Capability Assessment

An honest picture of where your team is ready to run experiments independently and where they need support, so the program is scoped to what your organisation can actually sustain.

Deliverable 06
Metrics Hierarchy Document (North Star + Drivers + Guardrails)

A single, written document that defines your North Star metric, the driver metrics your team can directly influence, and the guardrail metrics that protect against unintended harm — used in every review and prioritisation conversation going forward.

Deliverable 07
Instrumentation Audit with Gap Analysis

A detailed record of what your analytics stack currently tracks, what it's missing, and what needs to be fixed before experiments can produce reliable results.

Deliverable 08
5 Experiment Designs Ready to Run

Typically between 5 and 7 fully specced experiments, each with a hypothesis, control and variant definition, success metric, and minimum runtime — ready to hand to your team and start immediately.

Deliverable 09
Experiment Notion OS Pre-Built and Populated

A complete experiment management system set up in Notion with your first experiments already loaded, so your team has one place to plan, run, and review every test from day one.

Deliverable 10
20+ Hypothesis Backlog Scored and Structured

Your full backlog of ideas is scored by expected impact, confidence, and ease of implementation — so your team always knows what to work on next without spending hours debating prioritisation.

Deliverable 11
6-Month Experiment Calendar

A planned sequence of experiments across the next six months, aligned to your growth priorities — giving leadership visibility into what's being tested and why, without needing to be involved in every decision.

Deliverable 12
Experiment Log and Results Library

Every experiment your team runs is captured in a structured log with hypothesis, results, and decision — building institutional knowledge that doesn't disappear when team members leave.

Deliverable 13
Weekly Review Agenda Template

A repeatable agenda that keeps your weekly experiment review focused, time-efficient, and decision-oriented rather than turning into an open-ended status meeting.

Deliverable 14
2-Hour Team Training Session (Recorded)

A structured training session on experiment design, statistical validity, and how to use the Notion OS — recorded so new team members can get up to speed without a separate onboarding process.

Deliverable 15
Prioritisation Framework Documentation

The scoring model used to rank your hypothesis backlog is fully documented so your team can apply it independently as new ideas are generated over time.

Deliverable 16
Program Maintenance Guide

A written guide covering how to keep the program running after the engagement ends — including how to add new hypotheses, how to handle inconclusive tests, and when to revisit the metrics hierarchy.

Deliverable 17
60-Day Program Launch Support

For the first two months after delivery, direct access to get answers, review experiment designs before they go live, and troubleshoot anything that comes up as your team builds the habit.

Deliverable 18
Three Coaching Sessions + Email Access for Experiment Design Questions

Three structured sessions where your experiment lead or growth team can work through specific challenges, review live results, and get feedback on upcoming test designs. Email access for questions between sessions.

Everything above for $7,997. No hourly billing. No scope creep. Everything stays with your team.

THE TIMELINE

Foundation first, then backlog, then the OS that keeps it running.

Weeks 1–2
Audit & metrics architecture

Instrumentation audit completed and scored. Metrics hierarchy documented — North Star, driver metrics, and guardrails agreed with leadership sign-off. The pre-work that removes every post-test debate before the first test launches.

Weeks 3–4
First experiment designs & Notion OS build

First 5 experiment designs fully specced. First test launched and running. Notion OS built and populated with the first experiments, results library, weekly review agenda, and decision record. The institutional home for the program is live and owned by your team.

Weeks 5–6
Hypothesis backlog, team training & 6-month calendar

20+ hypothesis backlog delivered and scored. 6-month experiment calendar built from the backlog. 2-hour team training session completed — walkthrough of the metrics hierarchy, experiment design process, hypothesis scoring, and Notion OS. The team leaves the session able to run the next test independently.

WHO THIS IS FOR

This engagement is built for teams starting from zero — or resetting a program that died.

Good fit
  • B2B SaaS team that wants to start experimenting systematically — pre-program or with a broken program to reset
  • Growth PM hired specifically to build the experiment program, starting from zero
  • VP Product whose team wants to experiment but keeps getting bogged down in post-test debates
  • Team running 0–2 tests per quarter where results are argued about rather than acted on
  • Any team with product instrumentation in place and 5,000+ MAU to support meaningful tests
Not the right fit
  • Teams already running 5+ tests per quarter with an established, working program
  • Teams without product instrumentation in place — start with Launch PLG or PostHog Setup first
  • Products with fewer than 5,000 MAU — sample sizes make tests take 3–6+ months and the program won’t compound
  • Teams that want someone to run experiments for them on an ongoing basis (this engagement builds the program, not the operator)

Not sure if your analytics are ready to support a testing program? The instrumentation audit in Week 1 will tell you what gaps need addressing before the first test runs — including whether those gaps are blocking or just advisory.

WHO’S DOING THE WORK

Jake McMahon

Jake McMahon — ProductQuant

Jake McMahon
8+ years B2B SaaS · Behavioural Psychology + Big Data (Masters)

I run this engagement myself. Eight years as a product and growth lead inside B2B SaaS, watching smart teams make the same mistake: good tools, good instincts, no system. Experiment programs that live in one spreadsheet and one person’s head are not programs — they are single points of failure. The Notion OS and the metrics hierarchy are designed specifically to remove that dependency.

The most common place experiment programs break is the metric agreement step. Everyone has a view on the primary metric. Getting leadership sign-off on one number before the test runs — and keeping that number fixed when the results come in — is the constraint the engagement is built around. That problem is pre-hypothesis and pre-infrastructure. Fixing it first is what makes everything downstream work.

I won’t do this:
  • Build a testing process that only works while I’m in the room — the Notion OS is yours from handover day, permanently
  • Run the first test without the primary metric agreed upfront — experiments without a pre-agreed OEC produce debates, not decisions
  • Design hypotheses without reviewing your existing analytics data — the instrumentation audit comes before the backlog
  • Promise a specific number of revenue-impacting results — experiments produce learning, not guaranteed wins
Could our growth PM build this program themselves?
Possibly — over 6–9 months, running this alongside a live product roadmap. The engagement takes 6 weeks because the deliverables are built in sequence with dedicated focus: the audit informs the metric architecture, the metrics inform the sample size calibration, the calibration shapes the hypothesis scoring. A growth PM building this on the side will stall at the metric agreement step — getting leadership sign-off on the OEC takes longer than expected when it has to go through planning cycles. This engagement is designed to clear that bottleneck in Week 1.

Teams Jake has worked with

Gainify
Guardio
monday.com
Payoneer
thirdweb
Canary Mail
CircleUp

PRICING

One engagement. The whole program built. Yours permanently.

$7,997
one-time · 6-week engagement · fixed scope
  • Metrics hierarchy (North Star + driver metrics + guardrails)
  • Instrumentation audit scored across 5 dimensions
  • First 5 experiment designs, fully specced and ready to run
  • Experiment Notion OS — pre-built, pre-populated
  • Hypothesis backlog (20+ scored ideas)
  • 2-hour team training session + recording
  • 6-month experiment calendar
Book a Call to Start →

Guarantee: If you don't have a running experiment program with at least one test live at the end of the engagement, you get a full refund. The program exists and is operating — or you don’t pay.

Questions.

Anything else, book a call or send an email.

Book a call →
What’s the traffic minimum and why? +
Around 5,000 MAU in-product or 15,000+ monthly web visitors. Below that, sample sizes become so large that tests take 3–6 months to reach significance — and a program that takes 6 months per test does not compound. The sample size calculator built in the engagement will confirm your baseline before the first test design is finalised. If the numbers are not there, we’ll say so before you commit to the engagement.
What if we already run some experiments? +
The instrumentation audit in Week 1 handles this. It scores your current practice across 5 dimensions and identifies where the real constraint is — which is usually one layer upstream from where teams think it is. The most common finding: the block is not hypothesis volume but the absence of a pre-agreed primary metric. The audit maps what you have and shows you what to fix first, without assuming you are starting from zero.
How is this different from a CRO agency? +
This engagement is built for in-product B2B SaaS experimentation: lower traffic, longer conversion cycles, using tools like PostHog and Amplitude. The sample size model is calibrated from your actual product analytics data. It's designed for teams with 5,000+ MAU who need a program built, not just tests run.
Will the program keep running after the engagement ends? +
That is the design constraint the Notion OS is built around. Experiment programs die when they live in one person’s head or one spreadsheet nobody updates. The Notion workspace is institutional and owned by your team from handover day — experiment log, results library, hypothesis backlog, weekly review agenda, decision record. The 2-hour training means anyone on the growth team can run the next test with the same rigour, without needing Jake in the room.
Does this work as a standalone offer or a follow-on? +
Both work, but it pairs naturally as the next phase after Launch PLG or AI Feature Launch. Teams that just instrumented their product now have the data infrastructure to run experiments on it. Standalone cold outreach is also viable for growth PMs and VPs Product hired specifically to build an experiment program — the brief is clear and the pain is acute.
What do you need from our team? +
Access to your analytics platform (PostHog, Amplitude, or Mixpanel) for the instrumentation audit. One 60-minute working session in Week 1 for metrics hierarchy sign-off. Async review of the hypothesis backlog in Week 3. The Week 6 team training is 2 hours. Total time commitment from your team: around 5–6 hours across 6 weeks. Everything else is produced on my side and delivered for your review at each phase gate.
What if our analytics aren’t clean enough to run experiments? +
The instrumentation audit in Week 1 will tell you exactly what is and is not reliable. Not all gaps are blocking — some experiments can run on the data that exists while the gaps are addressed in parallel. Where gaps would invalidate results, the audit will flag which events need to be fixed before those specific tests can run, and which tests can launch immediately. You get an honest picture of what your analytics can and cannot support, not a dependency that blocks the whole program.

Stop running tests. Start running a program.

Your team gets the metrics architecture, the infrastructure, and the hypothesis backlog to run experiments systematically — and one test already live to prove it.