EXPERIMENT READINESS AUDIT — $997 · 5 BUSINESS DAYS

Jake McMahon — ProductQuant

8+ years B2B SaaS · Behavioural Psychology + Big Data (Masters)

Your experiment setup is silently invalidating every test you run.

You’ve got the analytics tool, the traffic, and the hypothesis. But your tracking events are misfiring, your sample sizes are too small, and your test designs have validity errors you’d never catch from a results dashboard. The Audit tells you exactly what’s broken — and what to fix first.

Book the Audit → See what’s included ↓

5 audit areas checked — go/no-go verdict or refund · 5-day delivery

WHAT THE AUDIT COVERS

Event instrumentation Are your tracking events firing correctly and capturing what matters?

Tracking plan Does your event taxonomy map to the metrics you’d actually test?

Tool configuration Is your experimentation tool set up for valid experiment results?

Sample size Do you have enough traffic and events to reach statistical significance?

Experiment design Are your test designs free of common validity errors?

$997 · fixed price · 5 business days

YOU ALREADY SUSPECT YOUR RESULTS AREN’T TRUSTWORTHY

Ran a 4-week experiment — results flipped when one parameter changed

“We ran an experiment for a full month. The results said the new onboarding was better. We re-ran it with a different traffic split and the results flipped. Now nobody trusts any of our experiment results.”

VP Product — B2B SaaS, $8M ARR

Tool says ‘significant’ — but the team isn’t sure it’s real

“Our experimentation tool keeps telling us results are statistically significant. But when we look at the numbers, something feels off. The effect sizes don’t match what we see in our analytics. We’re making decisions on results we don’t fully trust.”

Head of Growth — Series B

Tracking events don’t match what users actually do

“We discovered our ‘signup completed’ event was firing twice for some users and not at all for others. We’d been running experiments on that event for six months. Every result from those tests is unreliable now.”

Product Manager — B2B SaaS

No idea if you have enough traffic to trust your results

“We ran a test for three weeks and got a winner. Then someone asked if we’d had enough traffic for the result to be valid. Nobody could answer that. We shipped it anyway. Three months later we’re not sure it actually helped.”

CEO — Seed stage

WHAT THIS TYPICALLY UNCOVERS

The biggest validity threat is almost always invisible from your dashboard.

Most experiment setups have at least one validity threat that makes results unreliable.

In our experience, the majority of setups have at least one configuration issue — misfiring events, incorrect randomization, or sample size shortfalls — that silently invalidates results. The dashboards don’t flag these. They only show you the numbers.

Tracking plans rarely map to the metrics teams actually test.

Teams build tracking plans for product analytics, not for experiments. The events you track for engagement dashboards aren’t always the events you need to measure experiment outcomes. The gap between what you track and what you test is where invalid results come from.

Sample size is the most common blind spot in experiment programs.

Teams often calculate sample size once, at the start of a program, using estimated baseline rates. When actual rates differ — and they usually do — the calculated sample size is wrong. Experiments end too early or run too long, and neither produces valid results.

Tool misconfigurations don’t produce visible errors — they produce wrong results.

An experimentation tool that’s misconfigured doesn’t show a red warning. It shows a result that looks correct but isn’t. The most common issues — duplicate event firing, incorrect assignment units, and filtering that excludes the wrong users — are invisible until someone audits the setup.

WHY THIS IS DIFFERENT

Most teams find out their setup is broken after they’ve already shipped a losing variant. We check before you run your next test.

“Run the experiment and see what happens” assumes your setup produces valid results. It might not. This audit checks whether your tracking events fire correctly, whether your tool is configured for valid experiments, and whether you have enough traffic to reach statistical significance — before you spend weeks running a test that produces unreliable data.

You get a clear go/no-go: your setup can support valid experiments, or here’s exactly what to fix first. No interpretation required — each finding is tied to a specific configuration change your team can make.

TIMELINE

From read-only access to a clear verdict on your experiment setup — in 5 days.

DAYS 1-2

Access + Initial Review

Read-only access to your analytics and experimentation tools. Your tracking plan and experiment history reviewed. Current setup mapped against what valid experiments require.

→

DAYS 3-4

Deep Audit

Event instrumentation checked against actual user behaviour. Tool configuration reviewed for validity threats. Sample size calculated from your real traffic data. Existing experiment designs reviewed for common errors.

→

DAY 5

Verdict + Report

A clear go/no-go on whether your setup supports valid experiments — or a specific list of what to fix first, ranked by impact. Delivered async. No meeting required.

Day 6: you know your next experiment will produce results you can trust.

WHAT YOU GET

Stop guessing whether your experiment results are valid.

Instrumentation

Event Instrumentation Check

Your events reviewed against actual user behaviour. Misfiring events, duplicate fires, and missing events identified. You learn which events you can trust for experiment decisions — and which ones are producing unreliable data.

Events that fire when they shouldn’t
Events that don’t fire when they should
Duplicate event fires that inflate your data
Events capturing wrong properties for experiment metrics

Tracking Plan

Tracking Plan Review

Your current tracking plan evaluated against the metrics your experiments need to measure. Gaps between what you track and what you test identified. You get a mapping that shows which events support valid experiments and which ones need updating.

Events you track but never use in experiments
Metrics you test but don’t have events for
Property gaps that prevent valid metric calculation
Recommended additions ranked by experiment impact

Configuration

Tool Configuration Audit

Your tool’s randomization, assignment, and filtering settings reviewed. Common configuration errors that silently invalidate results — incorrect assignment units, traffic splits that break statistical assumptions, and filters that exclude the wrong users — identified and documented.

Randomization method verified against your experiment design
Assignment unit checked against your analysis unit
Traffic filters reviewed for bias introduction
Integration between analytics and experimentation tools validated

Sample Size

Sample Size Readiness

Sample size calculated from your actual traffic and conversion data — not estimated baseline rates. You learn which experiments your current traffic supports, how long each test needs to run, and which experiments you shouldn’t run yet because you don’t have enough data to reach significance.

Minimum detectable effect you can measure at your traffic level
Estimated runtime per experiment based on real data
Experiments your traffic can’t support yet
Baseline rates verified against your actual data

Experiment Design

Experiment Design Critique

Any existing or planned experiment designs reviewed for validity threats — multiple comparisons without correction, biased assignment, metric definitions that don’t measure what you think they measure. You get specific feedback on what to change before you run the test.

Hypothesis clarity and falsifiability checked
Multiple comparison corrections evaluated
Primary metric alignment with business outcome verified
Common validity errors flagged per design

On the cost of running invalid experiments: every week your team spends running an experiment with a broken setup is a week of engineering capacity wasted on results you can’t trust. If you run 4 experiments per quarter and even one produces unreliable results, that’s 25% of your experiment capacity lost — plus the cost of shipping a change based on bad data. The audit costs less than one invalid experiment.

FIT CHECK

Teams running experiments without confirming their setup works get the most from this.

GOOD FIT

B2B SaaS at $2M–$20M ARR running experiments without validating the setup

Analytics tool in place · experiments live or planned

The situation

You’re running A/B tests — or about to — but nobody has confirmed that your tracking events fire correctly, your tool configuration supports valid randomization, or your traffic can reach statistical significance in a reasonable window. You have an analytics tool and a testing tool. What you don’t have is confidence that the results they produce are reliable enough to act on.

What you leave with

A clear go/no-go verdict on whether your setup supports valid experiments
A specific list of what to fix, in order, before running your next test
Sample size calculations from your actual traffic — not estimates

Your next experiment produces results you can trust — because the setup was verified before you ran it.

NOT A FIT

No analytics tool, no experiment program, or need for full implementation

Wrong stage or different need

When this audit doesn’t apply

If you don’t have an analytics tool collecting event data, there’s nothing to audit — the instrumentation doesn’t exist yet. If you need someone to build your experiment program from scratch, this audit tells you what’s missing but doesn’t build it. And if your analytics implementation itself is broken, you need that fixed before checking whether experiments will run correctly on top of it.

Better starting points

Analytics Audit

$4,997 — Fix your analytics implementation before auditing experiments on top of it.

PostHog Setup

$2,997 — Get event tracking in place so you have data to audit.

The Foundation

$15K–$25K — Full experiment program built from scratch, including implementation.

What this audit doesn’t cover

The Experiment Readiness Audit checks whether your setup can produce valid experiment results. Your team runs the experiments and builds the program. If you need implementation or ongoing support, that’s a different engagement.

Running experiments — we audit the setup, your team runs the tests
Building your experiment program — we validate readiness, not build the program
Analytics implementation — we check tool configuration, not build the tracking
Ongoing support — the audit is a fixed-scope engagement, not a retainer

For full implementation → Growth LAB

Jake McMahon — ProductQuant

Jake McMahon

8+ years building retention, activation, and growth programs inside B2B SaaS · Behavioural Psychology + Big Data (Masters)

I run this audit myself. The event instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique — all of it. Your experiment setup is not generic. It’s specific to your product, your tracking architecture, and the gap between what your analytics tool reports and what actually happened. Generic checklists tell you to “verify your events” without telling you which events matter for experiment validity.

The audit produces a verdict your team can act on. The instrumentation gaps tell your engineer what to fix. The sample size calculations tell your PM which experiments to run — and which not to. The experiment design critique tells your growth lead what to change before burning a quarter of traffic on an invalid test. No translation required — every finding is formatted for the person who needs to act on it.

I won’t do this:

Give you a passing grade on your experiment setup if the data doesn’t support it
Audit your analytics implementation when what you need is an Analytics Audit
Tell you your sample size is “probably fine” without calculating it from your actual traffic
Review experiment designs without checking whether your tool can measure them correctly

What if you don’t have any experiments running yet?

The audit is designed for teams that are about to start experimenting — or have started and want to confirm their setup. If you haven’t run your first test, we audit the infrastructure: event instrumentation, tool configuration, sample size feasibility. You learn whether your setup can support valid experiments before you invest a sprint in running one. If the verdict is “not yet,” you get a specific list of what to build first. It’s better to know before your first experiment than to discover the results were invalid after.

Teams Jake has worked with

PRICING

One price. A verdict your team can act on.

$997

one-time · fixed price

5-day async delivery

Event instrumentation check — every experiment-critical event verified
Tracking plan review — gaps and inconsistencies documented
Tool configuration audit — randomization, assignment, and filters checked
Sample size readiness — calculated from your actual traffic data
Experiment design critique — validity threats flagged before you run
Clear go/no-go verdict with specific fixes listed
Everything stays with your team permanently

Clear go/no-go verdict — or specific list of what to fix, refund if we can’t deliver.

Book the Audit →

Clear go/no-go verdict on whether your setup supports valid experiments — or a specific list of what to fix. If the data can’t support a verdict, we tell you within 48 hours and refund the audit. You either get a definitive answer or you don’t pay.

Questions.

Or book a call →

What if we don’t have any experiments running yet? +

The audit is designed for exactly that situation. We check your infrastructure — event instrumentation, tool configuration, sample size feasibility — so you know whether your setup can produce valid results before you invest a sprint in running your first test. If the verdict is “not yet,” you get a specific list of what to build first. It’s cheaper to know before your first experiment than to discover the results were unreliable after.

How is this different from an Analytics Audit? +

An Analytics Audit checks whether your analytics implementation is correct — events firing, data flowing, dashboards accurate. The Experiment Readiness Audit checks whether your setup can produce valid experiment results — which includes analytics but goes further into randomization integrity, sample size feasibility, and experiment design validity. If your analytics itself is broken, start with the Analytics Audit. If your analytics works but you’re not sure your experiments will, this is the right audit.

What tools does this audit cover? +

Any experimentation tool — PostHog, LaunchDarkly, Optimizely, GrowthBook, Statsig, Amplitude Experiment, or a custom setup. The audit checks the configuration principles that apply regardless of platform: randomization method, assignment unit, traffic filters, and event instrumentation. The tool-specific details are covered as part of the configuration audit.

What do we own at the end? +

Everything. The instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique, and the go/no-go verdict with the specific fix list. All formatted for your team to act on directly — instrumentation fixes for your engineer, sample size estimates for your PM, design changes for your growth lead. There’s no dependency on ProductQuant after the audit ends.

What’s the guarantee? +

You get a clear go/no-go verdict — your setup supports valid experiments, or here’s the specific list of what to fix. If we can’t deliver that verdict because the data doesn’t exist or the tools aren’t configured enough to audit, we tell you within 48 hours and refund the full amount. No conditions. The guarantee is simple: you either get a definitive answer about your experiment readiness or you don’t pay.