TL;DR
- PMF validation built from team recall is systematically biased. Memory favours memorable conversations, confident voices, and the narratives the team already believes.
- Systematic analysis of sales call data consistently finds more jobs, different frequencies, and misranked priorities compared to what the team believes from recall.
- High NPS from early adopters, strong power user engagement, and positive customer interviews are not PMF signals. Each is a self-selection artefact.
- Real PMF evidence is cohort-based: retention curves, expansion rates, feature adoption by segment, and job frequency from systematically coded research.
What team recall actually produces
Every early-stage SaaS team has a working theory of their market. They know who their best customer is. They know what job the product does. They know which features drive retention and which ones are just nice-to-haves. They have heard it from customers directly — in demos, in support tickets, in customer success calls.
The problem is that "heard it from customers" and "measured it systematically" are two different things, and most teams treat them as equivalent.
Memory is not a neutral recording of customer conversations. It is a selective reconstruction that favours the dramatic over the mundane, the recent over the historical, and the voices that were most confident or most frequently encountered. When a founding team sits down to build a JTBD framework or define an ICP, they are drawing on this reconstructed memory — which means they are drawing on a biased, compressed, and partially invented account of what customers actually said.
This is not a critique of any particular team. It is a structural property of how human memory works, and it has predictable consequences for the research outputs that teams build from it.
Built from team recall
- Workshop with the founding team
- Top-of-mind customer quotes
- The loudest or most recent customer voice
- The narrative the team already believes
- Framework ready in a day
Built from systematic evidence
- Coded analysis of recorded sales calls
- Frequency counts for jobs and pains
- Retention curves segmented by behaviour
- Feature adoption mapped to declared ICP
- Framework takes weeks, survives scrutiny
How large is the gap?
A concrete illustration: a product team built a 27-job JTBD framework through a standard recall-based workshop process — collective team memory, customer quote harvesting, and synthesis into a prioritised list. The team was confident in the output. The framework passed internal review.
When the same product's 60 recorded sales calls were systematically coded against the same JTBD categories, the analysis found more than 85 jobs — more than three times the team's recall-based estimate. The recall process had not missed fringe edge cases. It had missed entire categories of jobs that came up repeatedly in the calls but had not registered as "important" in the team's memory.
The priority ranking gap was larger. The team's second-highest-priority feature — the one they most frequently cited as a retention driver — appeared in 43% of calls when systematically counted. The team's recalled estimate was 88%. That is not a rounding error. That is a feature prioritisation decision built on a frequency belief that was twice the actual rate.
Number of jobs found in systematic sales call analysis versus a recall-based JTBD workshop for the same product. The recall process did not miss edge cases — it missed whole categories that appeared repeatedly in the actual call data.
A second common pattern: teams often have a confident belief about who their primary user is. When this is validated against actual login and usage data, the segment the team identifies as primary frequently accounts for a much smaller share of active users than believed. In one analysis, a user persona that was described as accounting for 38–52% of users in the team's ICP definition accounted for 17% of actual product logins. The segment the team thought was secondary was primary in the data.
These are not unusual findings. They are the expected output of comparing recall-based research against evidence-based research. The gap varies by team and product, but it is consistently present and consistently in the same direction: the team overestimates the frequency of jobs and features they believe are most important, and underestimates everything that did not make it into the team's dominant narrative.
PMF signals that are not PMF signals
Beyond recall-based research, there is a second category of PMF evidence problem: signals that feel like PMF confirmation but are not measuring what teams think they are measuring.
High NPS from early adopters
Early adopter NPS is a self-selecting signal. The people who respond to NPS surveys in a product's first year are disproportionately the people who already have a strong positive relationship with the product. They found it, adopted it through friction, and stayed — which means they are the least representative sample of the broader ICP you need to capture to scale. A high early-adopter NPS tells you the product works for the most motivated subset of users. It does not tell you whether it works for the median customer you will need to acquire in year two.
Strong engagement from power users
Power user engagement metrics — session length, feature breadth, login frequency for the top decile — do not predict cohort-level retention. The behaviour of the users who engage most intensely with a product is not a reliable guide to what happens when you expand into a broader ICP segment. Power users tolerate more friction, discover more features independently, and self-solve more problems than the median user. Building for power user engagement often means building against median user retention.
Positive customer interviews
Customers in research interviews are polite. They tell you the product is useful, the team is responsive, and they are happy with the value. Then they churn. The interview context creates social pressure toward positive framing that does not reflect the internal calculus the customer is making about whether to renew. Positive qualitative feedback is weakly correlated with retention at the cohort level — and essentially uncorrelated with expansion.
What real PMF evidence looks like
PMF evidence is cohort-based, not aggregate. It connects a specific behaviour at a specific point in the user journey to a measurable outcome downstream. The four most reliable forms:
Cohort retention curves with a behavioural split
A retention curve shows what percentage of users return at 30, 60, and 90 days. A retention curve split by activation behaviour answers a different question: do users who complete activation event X retain at a materially higher rate than those who do not? If yes, you have identified a behaviour that is worth building around. If the retention curves are flat regardless of activation, the activation event you defined is not predicting the outcome you care about.
Job frequency from systematically coded sales calls
Recording sales calls and transcribing them is not enough. Systematic coding means defining a set of job and pain categories in advance, then counting how often each category appears unprompted across a call set large enough to produce stable frequency estimates — typically 40 to 60 calls minimum. The output is a frequency ranking that can be compared directly against the team's recalled priorities. The discrepancies are the research finding.
Feature adoption by declared ICP segment
If the team has defined an ICP — a specific segment they believe is the best-fit customer — the question is whether that segment actually uses the features built for them at the expected rate. If the ICP segment's feature adoption is lower than non-ICP users, the segment definition and the product build are misaligned. This is a PMF problem, not an onboarding problem.
Revenue expansion signals
Retained customers who naturally expand — who upgrade, add seats, or increase usage without a sales intervention — are the clearest PMF signal available. Expansion means the product is creating enough value that the customer spontaneously invests more. Flat retained customers have found an equilibrium where the product is worth keeping but not worth growing. Declining retained customers are churning slowly. The distribution across these three groups is a direct read on PMF health at the cohort level.
| Evidence type | Data source | What it confirms |
|---|---|---|
| Cohort retention curve (behavioural split) | Product analytics event stream | Whether a specific activation behaviour predicts retention |
| Sales call job frequency | Recorded and coded sales calls (40+ calls) | Which jobs and pains actually drive purchase decisions, unprompted |
| Feature adoption by ICP segment | Product analytics + CRM segment data | Whether the segment you built for actually uses what you built |
| Revenue expansion rate | Billing data, cohorted by signup month | Whether retained customers spontaneously grow their investment |
Why this matters more at certain stages
The gap between recalled and evidenced PMF research is a persistent problem, but its consequences change with product stage. At pre-seed, building on recall is reasonable — you do not have enough data to do otherwise, and speed matters more than precision. The JTBD framework and ICP definition from a founding team workshop are good enough to guide the first 12 months of product decisions.
The problem appears at Series A and beyond, when the team is making resource allocation decisions that assume the recall-based research is still accurate. By the time a product has 60+ sales calls recorded and 12 months of retention data, the team has the raw material for evidence-based research. But most teams continue to use the original recall-based framework because updating it is uncomfortable — it means finding out that the priorities the team has been executing against are based on inflated frequency estimates.
The investment case for PMF validation is not that the original research was wrong from the start. It is that the original research was a best-available approximation, and the data now available is substantially better. The teams that act on that data make different roadmap decisions, different hiring decisions, and different messaging decisions — and those decisions compound.
Data-Driven PMF Validation
In the Data-Driven PMF Validation cohort, you validate your hypotheses against your own data — sales calls, retention curves, and feature adoption by segment. You leave with a revised JTBD framework, a corrected ICP definition, and the evidence to back both.
Frequently asked questions
What is the difference between PMF recall and PMF evidence?
PMF recall is the founding team's collective memory of what customers have said — in meetings, demos, and calls. PMF evidence is what the data actually shows: how often a specific job or pain came up in recorded sales calls, what features retained users actually adopted, which segments expand vs. churn. Recall is fast and always available. Evidence requires systematic analysis. The gap between the two is almost always larger than the team expects.
Why is high NPS not a signal of product-market fit?
High NPS from early adopters is a self-selecting signal — the people most likely to respond to an NPS survey are those who already have a strong opinion of the product, positive or negative. Early adopters are also not representative of the broader ICP. They tolerate more friction, find more value in early features, and behave differently from the customers who will make up the majority of revenue at scale. NPS is a useful engagement signal, not a PMF signal.
What does real PMF evidence look like in product data?
Real PMF evidence is cohort-based, not aggregate. It looks like: retained users at 90 days sharing a specific activation behaviour at a statistically meaningful rate; a specific user segment showing natural revenue expansion without sales pressure; sales call analysis showing a specific job or pain coming up unprompted at a high frequency; feature adoption by the ICP segment matching the features built for that segment. Aggregate metrics like total NPS or average session length are not PMF evidence.
How many sales calls do you need to validate a JTBD framework?
There is no universal threshold, but structural patterns in JTBD analysis typically stabilise after 40 to 60 calls if the calls are systematically coded. Below that number, frequency estimates for individual jobs are unreliable. Above 60 calls, new jobs emerge at a lower rate. The more important factor is systematic analysis — coding calls against a consistent framework — rather than raw call volume. A team with 200 unanalysed call recordings has less evidence than a team with 60 systematically coded ones.
Your JTBD framework and ICP definition are only as good as the evidence behind them.
The Data-Driven PMF Validation cohort validates your hypotheses against the data you already have — and surfaces the discrepancies that should change what you build next.