EXPERIMENT READINESS AUDIT — $997 · 5 BUSINESS DAYS
You’ve got the analytics tool, the traffic, and the hypothesis. But your tracking events are misfiring, your sample sizes are too small, and your test designs have validity errors you’d never catch from a results dashboard. The Audit tells you exactly what’s broken — and what to fix first.
5 audit areas checked — go/no-go verdict or refund · 5-day delivery
WHAT THE AUDIT COVERS
$997 · fixed price · 5 business days
From read-only access to a clear go/no-go verdict on your experiment setup. No meetings required — async delivery.
Instrumentation, tracking plan, tool config, sample size, and experiment design. The five areas that determine whether your results are valid.
A clear go/no-go on whether your setup can produce valid experiments — or a specific list of what to fix first. Refund if we can’t deliver that.
YOU ALREADY SUSPECT YOUR RESULTS AREN’T TRUSTWORTHY
Ran a 4-week experiment — results flipped when one parameter changed
“We ran an experiment for a full month. The results said the new onboarding was better. We re-ran it with a different traffic split and the results flipped. Now nobody trusts any of our experiment results.”
VP Product — B2B SaaS, $8M ARR
Tool says ‘significant’ — but the team isn’t sure it’s real
“Our experimentation tool keeps telling us results are statistically significant. But when we look at the numbers, something feels off. The effect sizes don’t match what we see in our analytics. We’re making decisions on results we don’t fully trust.”
Head of Growth — Series B
Tracking events don’t match what users actually do
“We discovered our ‘signup completed’ event was firing twice for some users and not at all for others. We’d been running experiments on that event for six months. Every result from those tests is unreliable now.”
Product Manager — B2B SaaS
No idea if you have enough traffic to trust your results
“We ran a test for three weeks and got a winner. Then someone asked if we’d had enough traffic for the result to be valid. Nobody could answer that. We shipped it anyway. Three months later we’re not sure it actually helped.”
CEO — Seed stage
WHAT THIS TYPICALLY UNCOVERS
Most experiment setups have at least one validity threat that makes results unreliable.
In our experience, the majority of setups have at least one configuration issue — misfiring events, incorrect randomization, or sample size shortfalls — that silently invalidates results. The dashboards don’t flag these. They only show you the numbers.
Tracking plans rarely map to the metrics teams actually test.
Teams build tracking plans for product analytics, not for experiments. The events you track for engagement dashboards aren’t always the events you need to measure experiment outcomes. The gap between what you track and what you test is where invalid results come from.
Sample size is the most common blind spot in experiment programs.
Teams often calculate sample size once, at the start of a program, using estimated baseline rates. When actual rates differ — and they usually do — the calculated sample size is wrong. Experiments end too early or run too long, and neither produces valid results.
Tool misconfigurations don’t produce visible errors — they produce wrong results.
An experimentation tool that’s misconfigured doesn’t show a red warning. It shows a result that looks correct but isn’t. The most common issues — duplicate event firing, incorrect assignment units, and filtering that excludes the wrong users — are invisible until someone audits the setup.
WHY THIS IS DIFFERENT
Most teams find out their setup is broken after they’ve already shipped a losing variant. We check before you run your next test.
“Run the experiment and see what happens” assumes your setup produces valid results. It might not. This audit checks whether your tracking events fire correctly, whether your tool is configured for valid experiments, and whether you have enough traffic to reach statistical significance — before you spend weeks running a test that produces unreliable data.
You get a clear go/no-go: your setup can support valid experiments, or here’s exactly what to fix first. No interpretation required — each finding is tied to a specific configuration change your team can make.
TIMELINE
Read-only access to your analytics and experimentation tools. Your tracking plan and experiment history reviewed. Current setup mapped against what valid experiments require.
Event instrumentation checked against actual user behaviour. Tool configuration reviewed for validity threats. Sample size calculated from your real traffic data. Existing experiment designs reviewed for common errors.
A clear go/no-go on whether your setup supports valid experiments — or a specific list of what to fix first, ranked by impact. Delivered async. No meeting required.
Day 6: you know your next experiment will produce results you can trust.
WHAT YOU GET
Your events reviewed against actual user behaviour. Misfiring events, duplicate fires, and missing events identified. You learn which events you can trust for experiment decisions — and which ones are producing unreliable data.
Your current tracking plan evaluated against the metrics your experiments need to measure. Gaps between what you track and what you test identified. You get a mapping that shows which events support valid experiments and which ones need updating.
Your tool’s randomization, assignment, and filtering settings reviewed. Common configuration errors that silently invalidate results — incorrect assignment units, traffic splits that break statistical assumptions, and filters that exclude the wrong users — identified and documented.
Sample size calculated from your actual traffic and conversion data — not estimated baseline rates. You learn which experiments your current traffic supports, how long each test needs to run, and which experiments you shouldn’t run yet because you don’t have enough data to reach significance.
Any existing or planned experiment designs reviewed for validity threats — multiple comparisons without correction, biased assignment, metric definitions that don’t measure what you think they measure. You get specific feedback on what to change before you run the test.
On the cost of running invalid experiments: every week your team spends running an experiment with a broken setup is a week of engineering capacity wasted on results you can’t trust. If you run 4 experiments per quarter and even one produces unreliable results, that’s 25% of your experiment capacity lost — plus the cost of shipping a change based on bad data. The audit costs less than one invalid experiment.
FIT CHECK
The situation
You’re running A/B tests — or about to — but nobody has confirmed that your tracking events fire correctly, your tool configuration supports valid randomization, or your traffic can reach statistical significance in a reasonable window. You have an analytics tool and a testing tool. What you don’t have is confidence that the results they produce are reliable enough to act on.
What you leave with
Your next experiment produces results you can trust — because the setup was verified before you ran it.
When this audit doesn’t apply
If you don’t have an analytics tool collecting event data, there’s nothing to audit — the instrumentation doesn’t exist yet. If you need someone to build your experiment program from scratch, this audit tells you what’s missing but doesn’t build it. And if your analytics implementation itself is broken, you need that fixed before checking whether experiments will run correctly on top of it.
Better starting points
The Experiment Readiness Audit checks whether your setup can produce valid experiment results. Your team runs the experiments and builds the program. If you need implementation or ongoing support, that’s a different engagement.
Jake McMahon — ProductQuant
I run this audit myself. The event instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique — all of it. Your experiment setup is not generic. It’s specific to your product, your tracking architecture, and the gap between what your analytics tool reports and what actually happened. Generic checklists tell you to “verify your events” without telling you which events matter for experiment validity.
The audit produces a verdict your team can act on. The instrumentation gaps tell your engineer what to fix. The sample size calculations tell your PM which experiments to run — and which not to. The experiment design critique tells your growth lead what to change before burning a quarter of traffic on an invalid test. No translation required — every finding is formatted for the person who needs to act on it.
Teams Jake has worked with




PRICING
Clear go/no-go verdict — or specific list of what to fix, refund if we can’t deliver.
Book the Audit →Clear go/no-go verdict on whether your setup supports valid experiments — or a specific list of what to fix. If the data can’t support a verdict, we tell you within 48 hours and refund the audit. You either get a definitive answer or you don’t pay.
Your tracking verified. Your tool configuration checked. Your sample sizes calculated from real data. A go/no-go verdict you can trust — or a specific list of what to fix, delivered in 5 days.