LAUNCH EXPERIMENT PROGRAM — A running program, not just a test
Jake McMahon · ProductQuant
An engagement that takes your team from 0–2 tests per quarter to a running experiment program — with the metrics architecture, infrastructure, hypothesis backlog, and Notion OS that keeps it running after the engagement ends.
Running experiment program with at least one test live at the end
WHAT YOU HAVE AT THE END
Fixed price · 6-week engagement
You get a live experiment, the tools to manage it, and a list of 20 more tests to run next.
PRODUCT MANAGER
"Will customers buy more if we add this feature?"
We build a test that shows the new feature to half your visitors. You see if sales go up. This tells you if the feature is worth building.
MARKETING TEAM
"Which headline gets more people to sign up?"
We show two different headlines on your website to different visitors. You see which one gets more emails. Now you know the best message to use.
DESIGNER
"Is the green button or the blue button better?"
We set up a test where half the users see one button, half see the other. You see which color gets more clicks. You can stop guessing about design.
WEEKLY CHECK-IN
"What should we test next?"
We give you a scored list of 20 new test ideas. You pick the best one and we build it. You always know what to try without running out of ideas.
What happens when experimentation runs on enthusiasm instead of infrastructure
The roadmap is driven by the loudest voice in the room, the most recent customer complaint, or the founder’s intuition. Everyone agrees that experiments would help. Nobody has built the infrastructure to run them systematically. The gap between “wanting to experiment” and “running experiments” stays open for quarters.
“We keep saying we’ll be more data-driven this quarter. We keep making product decisions the same way we always have.”
Tests get called early when the early data looks promising. Results get challenged because the primary metric was never agreed upfront. Post-test debate — where everyone cites a different metric that supports their pre-existing view — absorbs more time than the test itself. The experiment program dies because it produces controversy, not decisions.
“We ran the test. It ‘won’ on the metric we tracked. Engineering shipped it. Retention dropped. Nobody can explain why.”
The team has the instinct to test but not the framework. What is the right sample size? How long should the test run? How do we score hypotheses against each other? What happens if the test reaches significance early? Every experiment starts from scratch because there is no repeatable process — and the answers require a data scientist who is not in the room.
“We have good ideas. We just can’t run them rigorously enough for the results to mean anything.”
A previous growth PM set up a testing process. It lived in their head and one spreadsheet. They left. The spreadsheet stopped being updated. Within a quarter, the team was back to shipping features based on intuition, and nobody could find the experiment log. Programs that live in one person’s head are not programs — they are single points of failure.
“We had a testing process. Then our growth lead left. Now we have a spreadsheet nobody updates.”
WHY THIS IS DIFFERENT
Experiment programs die when they are built on enthusiasm, not infrastructure.
An experiment program often stalls because the primary metric was never agreed, the hypothesis scoring was informal, and the process lived in one person’s head rather than a system the whole team can use.
This engagement builds the infrastructure that keeps the program running after it ends. The metrics hierarchy is agreed in writing before any test launches. The sample size calculator removes the “how long do we run this?” question permanently. The Notion OS is institutional — owned by the team, not by the person who built it. The hypothesis backlog gives 20+ months of scored, structured test ideas, so the program is never stalled for lack of something to test.
At the end, the program is running. One test is live. The team knows how to spec the next one without Jake in the room. The HiPPO no longer wins product debates by default — because there is a process for running the test that would settle it.
WHAT YOU GET
Your team leaves with a shared, agreed definition of the one metric that matters most for growth and the handful of driver metrics that reliably move it — ending the internal debate about what you're actually optimising for.
Every critical measurement gap in your current analytics setup is identified and documented, so you stop making product decisions on incomplete or misleading data.
Between 20 and 30 structured experiment hypotheses, each tied to a real metric and a clear business rationale — enough to keep your team running meaningful tests for months without starting from a blank page.
Before your first experiment runs, you know exactly how long tests need to run to reach statistical significance — so you stop calling tests early and making decisions on noise.
An honest picture of where your team is ready to run experiments independently and where they need support, so the program is scoped to what your organisation can actually sustain.
A single, written document that defines your North Star metric, the driver metrics your team can directly influence, and the guardrail metrics that protect against unintended harm — used in every review and prioritisation conversation going forward.
A detailed record of what your analytics stack currently tracks, what it's missing, and what needs to be fixed before experiments can produce reliable results.
Typically between 5 and 7 fully specced experiments, each with a hypothesis, control and variant definition, success metric, and minimum runtime — ready to hand to your team and start immediately.
A complete experiment management system set up in Notion with your first experiments already loaded, so your team has one place to plan, run, and review every test from day one.
Your full backlog of ideas is scored by expected impact, confidence, and ease of implementation — so your team always knows what to work on next without spending hours debating prioritisation.
A planned sequence of experiments across the next six months, aligned to your growth priorities — giving leadership visibility into what's being tested and why, without needing to be involved in every decision.
Every experiment your team runs is captured in a structured log with hypothesis, results, and decision — building institutional knowledge that doesn't disappear when team members leave.
A repeatable agenda that keeps your weekly experiment review focused, time-efficient, and decision-oriented rather than turning into an open-ended status meeting.
A structured training session on experiment design, statistical validity, and how to use the Notion OS — recorded so new team members can get up to speed without a separate onboarding process.
The scoring model used to rank your hypothesis backlog is fully documented so your team can apply it independently as new ideas are generated over time.
A written guide covering how to keep the program running after the engagement ends — including how to add new hypotheses, how to handle inconclusive tests, and when to revisit the metrics hierarchy.
For the first two months after delivery, direct access to get answers, review experiment designs before they go live, and troubleshoot anything that comes up as your team builds the habit.
Three structured sessions where your experiment lead or growth team can work through specific challenges, review live results, and get feedback on upcoming test designs. Email access for questions between sessions.
Everything above for $7,997. No hourly billing. No scope creep. Everything stays with your team.
THE TIMELINE
Instrumentation audit completed and scored. Metrics hierarchy documented — North Star, driver metrics, and guardrails agreed with leadership sign-off. The pre-work that removes every post-test debate before the first test launches.
First 5 experiment designs fully specced. First test launched and running. Notion OS built and populated with the first experiments, results library, weekly review agenda, and decision record. The institutional home for the program is live and owned by your team.
20+ hypothesis backlog delivered and scored. 6-month experiment calendar built from the backlog. 2-hour team training session completed — walkthrough of the metrics hierarchy, experiment design process, hypothesis scoring, and Notion OS. The team leaves the session able to run the next test independently.
WHO THIS IS FOR
Not sure if your analytics are ready to support a testing program? The instrumentation audit in Week 1 will tell you what gaps need addressing before the first test runs — including whether those gaps are blocking or just advisory.
WHO’S DOING THE WORK
Jake McMahon — ProductQuant
I run this engagement myself. Eight years as a product and growth lead inside B2B SaaS, watching smart teams make the same mistake: good tools, good instincts, no system. Experiment programs that live in one spreadsheet and one person’s head are not programs — they are single points of failure. The Notion OS and the metrics hierarchy are designed specifically to remove that dependency.
The most common place experiment programs break is the metric agreement step. Everyone has a view on the primary metric. Getting leadership sign-off on one number before the test runs — and keeping that number fixed when the results come in — is the constraint the engagement is built around. That problem is pre-hypothesis and pre-infrastructure. Fixing it first is what makes everything downstream work.
Teams Jake has worked with





PRICING
Guarantee: If you don't have a running experiment program with at least one test live at the end of the engagement, you get a full refund. The program exists and is operating — or you don’t pay.
Your team gets the metrics architecture, the infrastructure, and the hypothesis backlog to run experiments systematically — and one test already live to prove it.