Multi-Method Customer Research for B2B SaaS: A Full System

Q: How long does this type of engagement take?

The realistic total is 8–12 weeks — 6–8 weeks for the research phase (JTBD analysis, sales call transcription and analysis, Kano survey design and fielding, TAM sizing from primary data, and internal usage data review), plus 2–4 weeks for the synthesis phase (pricing architecture, roadmap reclassification, event taxonomy design). That length is appropriate for research that informs decisions with multi-year compounding effects. Roadmap sequencing, pricing tier design, and GTM resource allocation do not get revisited every quarter.

TL;DR

Most product teams run one research method at a time. Running four simultaneously — and cross-validating the results — surfaces corrections that no single method would catch alone.
In a review of 60 sales calls for a HIPAA-compliant healthcare SaaS, a feature the team ranked as its second-highest priority turned out to apply to only 43% of prospects. The prior research had pegged it at 88%.
A persona segment estimated at 38–52% of pipeline was actually 17% — meaning 23 percentage points of GTM spend was pointed at the wrong buyers.
Market sizing using the CMS NPPES national database (9.4M provider records) independently confirmed the same high-priority niche that internal product usage data had flagged — a double signal with no common source of error.
The research did not end with persona corrections. It cascaded into pricing tier architecture, product roadmap sequencing, and event taxonomy design.

Your product roadmap has assumptions in it. That is not a criticism — it is the nature of building with incomplete information. The assumptions are understandable. The problem is that most research methods are designed to confirm them.

JTBD interviews surface what motivated buyers articulate in structured conversations. Sales calls reveal what actually comes up when money is on the table. Kano surveys separate what users think they need from what they actually pay for. Market databases show who is out there — not just who responds to outreach.

None of these methods alone gives you the full picture.

When you run all four simultaneously and cross-validate the results, the corrections are the ones no single method would have caught:

The feature your team ranked second turns out to apply to 43% of your market — not 88%.
The persona absorbing 20% of your GTM budget represents 17% of pipeline — not 38%.
The job-to-be-done appearing in 57% of your sales conversations was not in your original framework at all.

These are not edge cases. They are what happens when research methods run in isolation inherit each other's assumptions.

"The goal of running multiple research methods isn't to generate more data. It's to find the places where the methods disagree — because that's where the assumptions are hiding."

"The finding that changes everything is rarely the one you were looking for. It's the one the cross-validation surfaces that your team was confident about — and confident in the wrong direction."
— Jake McMahon, ProductQuant

Why Most Product Research Produces the Wrong Answers

The standard approach to product research is sequential: run interviews, build a JTBD framework, validate it with surveys or sales calls, then act on the output.

The flaw is structural. Each step builds on the prior one:

Initial interviews over-represent the most vocal, tech-forward early adopters. The JTBD framework inherits that sampling bias.
The validation survey is designed around the framework's existing categories — so it confirms them rather than testing them.
You end up with research that is internally consistent and directionally wrong.

The deeper problem is a coding error that runs through almost all qualitative research: teams treat "mentioned" as equivalent to "important," and "mentioned by some" as equivalent to "applicable to all."

A feature that appeared in 20 out of 24 early interviews gets coded as a near-universal priority. When you later analyse 60 sales calls with a statistical lens, you discover that those 20 interview participants were drawn predominantly from one segment — a segment that represents 43% of the actual market, not 88%.

That gap is not a data quality problem. It is a method problem. The interview sample was appropriate for generating hypotheses. It was not appropriate for frequency estimation. Cross-validation exists precisely to catch that distinction.

What Is the Compound Research Stack?

The compound research stack is five methods run in parallel, with outputs compared and corrected against each other. Each method surfaces what the others miss.

Method 1: JTBD Interviews

JTBD interviews surface the outcomes buyers are hiring the product to accomplish — the functional, emotional, and social jobs behind the purchase decision. Structured correctly, a JTBD interview does not ask "what do you want this product to do?" It asks what the buyer was trying to accomplish before they found this product, what they tried before, and what would make them consider the job done.

The limitation is selection bias. JTBD interviews are excellent at generating a hypothesis set, but the sample is always skewed toward articulate, motivated participants who agreed to be interviewed. Running synthetic JTBD analysis — applying the framework to a broader dataset of recorded interactions, not just structured interviews — extends the sample without that selection effect.

In practice, this means treating your sales call transcripts, support tickets, and cancellation surveys as JTBD data, not just operational records. The participants did not volunteer for a research conversation. They showed up because they had a real problem and were trying to solve it. That is a different and often more honest signal.

Method 2: Sales Call Analysis

Sales calls are under-utilised as a research source. They are not interviews — buyers in a sales conversation are managing an impression, under time pressure, and filtering what they reveal. What surfaces is different from what surfaces in a structured interview.

Sales calls reveal the hierarchy of purchase barriers, not just the hierarchy of desired features. When a prospect mentions a feature in a sales call, you learn not just that the feature matters but whether it is informational (they are curious), preferential (they would like it), or blocking (they will not proceed without it). That three-way distinction rarely appears in interview transcripts.

In an engagement with a HIPAA-compliant healthcare SaaS, 60 sales call transcripts were analysed statistically to validate a prior JTBD framework. The framework had coded EHR integration as the second most important job, appearing in 88% of prospect conversations. The sales call analysis found it mentioned in 43% of calls — and identified as an absolute deal-breaker in only 13%.

That is not a minor variance. The team was allocating product and GTM resources based on a figure that was more than twice the actual rate — and treating a segment-specific requirement as a near-universal one.

Method 3: Kano Survey

Kano analysis categorises features not by how often they are requested, but by how their presence or absence affects buyer satisfaction. The three categories that matter most: must-haves (absence causes dissatisfaction; presence is simply expected), performance features (more is better, linearly), and delighters (absence is neutral; presence causes disproportionate satisfaction).

The distinctions that matter for pricing:

Must-haves belong in every tier. Absence causes dissatisfaction; presence is simply expected. Locking a must-have behind a higher plan is a retention liability.
Delighters belong in higher tiers. They cause disproportionate satisfaction when present, but no dissatisfaction when absent — which means buyers won't pay a premium for something they didn't know they wanted.
Pricing built on survey popularity — not Kano category — puts delighters in the base plan and charges extra for must-haves. That is the structure that drives churn.

Method 4: TAM/SAM/SOM with a Public Database Cross-Reference

Standard market sizing relies on secondary research: industry reports, analyst estimates, and survey data. The problem is that these sources aggregate differently. One study defines the same segment differently than another, producing estimates that vary by 3–4× depending on which source you cite.

Using primary administrative data changes this. For healthcare, a publicly available national registry contains records for every registered provider organisation in the United States — millions of entries, updated monthly, maintained by the federal government. Sizing a market segment from this registry means counting directly from the authoritative administrative record, not inferring from a survey about a survey.

In the healthcare SaaS engagement, this approach produced three distinct outcomes:

It resolved a 2–3× discrepancy in prior market estimates. Prior estimates for the primary target segment had ranged from 30,000 to 50,000 organisations. The registry count delivered a precise, defensible figure — and revealed that the discrepancy came from conflating two different organisational structures that looked the same from the outside.
It surfaced a high-opportunity niche the team had not prioritised. One subsegment — referred to here as Segment X to protect the client's strategic advantage in this niche — showed a far more favourable ratio of addressable organisations to estimated ACV than any other segment in the sizing model. The team's prior GTM focus had not been pointed at it.
It made SOM quantifiable, not estimated. With precise subsegment counts from the registry, a realistic serviceable obtainable market calculation became possible from the bottom up: count of addressable organisations × estimated conversion rate × ACV. The figure was smaller than the top-down estimate — and more credible for it.

The methodology generalises to any regulated market with a public provider registry. Healthcare is well-suited because the registry is publicly available and complete by mandate. Similar datasets exist for legal services, financial services, real estate, and other licensed professions.

Method 5: Internal Usage Data as the Confirmation Layer

The final input is the product's own behavioural data. Which segments are already self-selecting into the product? Which activate fastest, retain longest, and expand most?

When market sizing identifies a segment as high-priority and internal product data independently shows that same segment over-indexing in retention and conversion, you have a double signal:

The market says this segment is large and addressable.
The product says it already works for them — confirmed by actual retention and expansion behaviour, not intention.

No single source of error explains both findings simultaneously — and that convergence is the highest-confidence signal in the research stack. The registry and the product analytics platform share no methodology. When they agree, the conclusion is defensible.

In the healthcare SaaS engagement, this is exactly what happened with Segment X: an administrative database and a product analytics platform independently confirmed the same niche as the highest-priority target — before the team had allocated significant GTM resources toward it.

Method	What it reveals	What it misses alone
JTBD Interviews	Outcome hierarchy, emotional jobs, language of motivation	Selection bias; skews toward vocal adopters
Sales Call Analysis	Purchase barriers, blocking vs. preferential features, segment differences	Doesn't capture users who never reached sales
Kano Survey	Must-have vs. delighter distinction, pricing tier logic	Doesn't capture purchase timing or decision drivers
Public Administrative Registry	Definitive segment counts, bottom-up SOM, niche discovery	Static; doesn't reflect buying intent or budget cycle
Internal Usage Data	Actual product-market fit signal, retention and expansion by segment	Limited to existing customers; misses non-buyers

ProductQuant Framework

See how research fits into the full DISCOVER process

The compound research stack is the foundation phase of ProductQuant's eight-phase engagement. Every downstream decision — pricing, roadmap, event taxonomy — is built on what the research confirms.

Explore the DISCOVER Framework See How ProductQuant Works

What the Cross-Validation Actually Found

Three corrections were significant enough to change every downstream decision. Here is the before/after at a glance:

Finding	Prior estimate	Cross-validation result
EHR integration priority	88% frequency (coded as #2 job)	43% frequency — deal-breaker in only 13%
Multi-location persona share	38–52% of pipeline	17% of pipeline
Paper digitisation job	Not in original framework	57% of sales conversations — new top-3 job
Segment X niche opportunity	Not prioritised in GTM	Confirmed #1 opportunity by market registry and product usage data independently

EHR Integration Was Twice as Overestimated as the Team Believed

The JTBD framework had listed EHR integration as the second most important job — estimated at 88% frequency based on prior interview analysis. The sales call analysis found it mentioned in 43% of conversations and flagged as an absolute deal-breaker in only 13%.

The source of the error was definitional. The prior analysis had conflated "mentioned integration" with "requires integration." Buyers who raised integration in passing — curious about roadmap, noting a preference — were coded the same way as buyers who said they would not sign without it. In the sales call data, those two groups behaved completely differently.

The corrected picture: integration is a must-have for a specific segment — multi-location practices with standardised clinical workflows — and largely irrelevant to the majority, for whom a PDF export is entirely sufficient. Treating it as a near-universal requirement had led to over-investment in integration roadmap items and under-investment in the features 87% of buyers considered critical.

A Core Persona Was Overrepresented by 2–3×

The persona framework had identified a "scaling physician" segment — multi-location operators — as representing 38–52% of the pipeline. Sales call analysis found this segment at 17%. The prior research had defined the segment as "growth-focused physicians," which was too broad to be useful.

When the definition was tightened — to practices with 3 or more locations or 500+ form submissions per month — the segment contracted to its real size.

23 pp

23 percentage points of GTM resources had been allocated toward buyers who were either not in the pipeline at the claimed rate, or who had different purchase drivers than the framework described. This was identified by comparing defined persona allocation against actual sales call distribution.

This is not an indictment of the team. The initial definition was reasonable given the research available at the time. The cross-validation found what the prior method could not. That is the purpose of running multiple methods.

Market Data and Usage Data Independently Confirmed the Same High-Priority Niche

Administrative registry sizing identified a specific subsegment — referred to here as Segment X to protect the client's strategic advantage in this niche — as the highest-opportunity target by revenue per addressable organisation. The registry confirmed the total addressable count precisely, and cross-validation against a third-party benchmark survey corroborated the figure. Internal product data, run independently, showed the same segment over-indexing in retention, deal size, and expansion revenue.

The significance is methodological:

The two data sources have no common failure mode. An administrative registry and a product analytics platform make completely different assumptions and have completely different potential errors.
Both pointed to the same segment. That is not coincidence — it is the kind of convergence that justifies GTM reallocation.
The team had not prioritised this segment before the research. The finding did not confirm an existing hypothesis. It introduced a new one and gave it the strongest evidentiary foundation of anything in the engagement.

This is the outcome the compound research stack is designed to produce: a finding that would not have surfaced from any single method, confirmed by two independent sources with no shared methodology.

"Most teams find one signal and act. The research that changes the trajectory is the second, independent signal that confirms the first — especially when the two sources have nothing in common methodologically."
— Jake McMahon, ProductQuant

Related Offer

Not sure which assumptions in your research are wrong?

ProductQuant's DISCOVER framework starts with exactly this kind of structured diagnostic — using your sales call data, customer base, and primary market sources.

See the DISCOVER Framework Book a Call

How the Research Changed Every Downstream Decision

The research output is not a report that gets filed. It is the input to decisions that would otherwise be made with lower-quality data. Each correction cascaded forward.

Pricing Tier Architecture

Two features confirmed as universal must-haves. One segment-specific feature correctly moved to the enterprise tier. The base plan redesigned to include what buyers expected to be there.

When the cross-validation confirmed that HIPAA compliance (mentioned in 97% of sales calls) and manual data entry elimination (mentioned in 100% of calls) were the two co-equal top jobs-to-be-done, the pricing implication was immediate: both had to be present in the base plan, not positioned as upsells. Any tier structure that locked compliance features behind a higher price point would fail evaluation — not because buyers would not pay, but because they would feel misled.

The Kano analysis also clarified which features belonged in premium tiers. EHR integration — a must-have for 13% of buyers — and a specific third-party integration (mentioned in 8% of calls, but with a 60% deal-breaker rate within that group) are genuine must-haves for a specific, high-ACV segment and largely irrelevant to everyone else. A feature that is table-stakes for one segment and irrelevant to another is exactly what a differentiated enterprise tier should contain.

Product Roadmap Sequencing

Features coded as near-universal requirements were reclassified as segment-specific. A new top-3 job — not in the prior framework — was found in 57% of sales conversations.

Corrected feature priority rankings changed the P0/P1/P2 classification directly. Features allocated high-priority engineering investment on the assumption they were near-universal requirements were reclassified as segment-specific. Investment shifted toward the jobs confirmed across the full 60-call dataset.

The cross-validation also surfaced a new top-3 job the original JTBD framework had missed entirely: digitising paper forms for established practices that had never previously offered a digital intake workflow. This job appeared in 57% of sales conversations and had no owner in the roadmap. The research found it. The prior framework had not.

Event Taxonomy Design

Corrected persona definitions reset what "activated" means for each segment — which resets every downstream analysis that depends on it.

The event taxonomy — the set of user actions tracked in the product analytics platform — should be designed around validated personas and their validated jobs. If the analytics platform is tracking proxy metrics for the wrong jobs, every downstream analysis (activation rate, churn prediction, feature adoption) is measuring the wrong things.

With corrected persona definitions and a validated job hierarchy, the taxonomy design becomes a precise mapping exercise: for each validated persona, what is the moment that signals they have completed the job they hired the product to do? That signal becomes the activation event. Tracking it consistently creates the foundation for cohort analysis, churn prediction, and experiment design — all of which depend on knowing what "activated" actually means for each segment.

What This Approach Requires

This is not a two-week research sprint. Running four methods simultaneously — and doing the cross-validation rigorously — requires inputs that some teams will not have ready and time that typically runs 6–12 weeks.

What makes the engagement possible:

Recorded sales call transcripts. Fifty to eighty is a workable minimum for statistical reliability. If your CRM stores call recordings but they are not transcribed, transcription is the first step. The transcripts are the most important single input — they carry frequency data that interview samples cannot reliably provide.
An existing customer base for Kano surveying. Survey responses from real buyers who have made a purchase decision carry different signal than responses from trial users or prospects. The Kano analysis is most useful when the respondents have already decided.
Internal product data with segment-level visibility. Aggregate activation or retention metrics do not support the cross-validation. You need to filter outcomes by segment — which means either having defined segments in your analytics platform already, or building them from event data.
A public administrative database relevant to your vertical. NPPES is healthcare-specific. Similar authoritative registries exist for legal services, financial services, and other licensed professions. B2B companies without a regulated market can use NAICS company-count data or LinkedIn firmographic filtering for segment sizing.

The right time for this approach is when your team has strong assumptions about who your buyer is and what they need — and those assumptions are driving roadmap, pricing, and GTM allocation decisions that are expensive to reverse. The cost of running the research is bounded. The cost of acting on uncorrected assumptions for another 12–18 months is not.

FAQ

What is JTBD analysis and how does it differ from user interviews?

JTBD focuses on the outcome the buyer is hiring the product to accomplish — not what they say they want. A user interview asks "what do you use this for?" — JTBD asks "what were you trying to accomplish before you found this?" and "what would success look like?" The distinction matters because feature requests reflect surface-level preferences; JTBD surfaces the underlying motivation. A team can add features indefinitely and still lose buyers if the product does not deliver the job they hired it to do. The methodology traces to Christensen's research on innovation and customer choice.

What is Kano analysis and what does it tell you?

Kano analysis categorises features by how their presence or absence affects satisfaction — not by how often they are requested. The key distinction: must-haves generate no satisfaction when present, only dissatisfaction when absent. Delighters generate outsized satisfaction when present but no dissatisfaction when absent. Most product teams build as if more features linearly increases value. Kano shows the relationship is non-linear, and a feature's category — not its popularity — determines where it belongs in your pricing structure. Getting this wrong tends to show up as a base tier that buyers feel underserves them.

How do you use a public administrative registry for market sizing?

The key advantage is counting directly from authoritative records — not inferring from analyst surveys that define the same segment differently. For regulated industries, a national registry typically contains records for every registered provider, updated on a defined schedule. For market sizing, you filter by entity type, location count, and classification to count segments from the primary source rather than from secondary estimates. This produces segment counts that are defensible and precise — and it often reveals discrepancies (sometimes 2–3×) between prior estimates and actual segment size, caused by definitional differences that secondary sources paper over.

How do you cross-validate research from multiple methods?

Convergence gives you high confidence. Divergence is where the actionable findings live. Cross-validation means running methods independently — with separate samples or data sources — then comparing findings. If interviews, sales calls, and usage data all identify the same segment as high-priority, that is a defensible conclusion from three independent directions. If interviews say a feature is an 88% priority and sales call analysis shows 43%, the methods have produced different answers. The next step is identifying why — usually a definitional issue in how the feature was described, or a sampling issue in who was in the interview set. Resolving the divergence is often where the most actionable finding lives.

How long does this type of engagement take?

The realistic total is 8–12 weeks — 6–8 weeks for the research phase (JTBD analysis, sales call transcription and analysis, Kano survey design and fielding, TAM sizing from primary data, and internal usage data review), plus 2–4 weeks for the synthesis phase (pricing architecture, roadmap reclassification, event taxonomy design). That length is appropriate for research that informs decisions with multi-year compounding effects. Roadmap sequencing, pricing tier design, and GTM resource allocation do not get revisited every quarter.

Sources

CMS National Plan and Provider Enumeration System (NPPES) — administrative NPI records, 9.4M entries, updated monthly
AMA 2024 Physician Practice Benchmark Survey — independent validation of healthcare practice segment counts
Christensen, C.M., Hall, T., Dillon, K., & Duncan, D.S. (2016). Competing Against Luck. Harper Business — foundational JTBD framework
Kano, N. et al. (1984). "Attractive quality and must-be quality." Journal of the Japanese Society for Quality Control — original Kano model
ProductQuant's analysis of 60 validated sales calls for a HIPAA-compliant healthcare SaaS platform (January–February 2026). Figures cited: 43% EHR integration frequency (actual vs. 88% prior estimate); 17% multi-location persona frequency (actual vs. 38–52% prior estimate); 23 percentage points of GTM misallocation identified; 57% paper digitisation job frequency (new discovery, not in prior framework); 85+ validated jobs-to-be-done (tripled from original 27-job framework). The niche identified as the highest-opportunity target via administrative registry analysis and confirmed by internal product usage data is referred to as "Segment X" to protect the client's strategic advantage.

About the Author

Jake McMahon is a product growth strategist and founder of ProductQuant, working with B2B SaaS companies at $1M–$50M ARR to build the growth infrastructure that founders and CPOs typically try to construct between their other priorities.

Research is the foundation of ProductQuant's engagement process. The work described in this article — multi-method JTBD validation, Kano analysis, sales call statistical analysis, and market sizing from primary administrative data — is how ProductQuant diagnoses where product decisions are grounded in assumptions versus evidence. In one engagement, this approach produced more than 85 validated jobs-to-be-done from 60 sales call transcripts, corrected two persona estimates each off by 2–3×, and identified a top-3 job-to-be-done the prior framework had missed entirely.

About Jake Talk to ProductQuant

Next Step

Replace assumptions with evidence before the next cycle begins

Most product teams reach the point of running this kind of research after they have already made the expensive decisions — roadmap commitments, pricing launches, GTM investments — on assumptions the research would have corrected. The DISCOVER framework starts with the structured diagnostic phase described here.

Explore the DISCOVER Framework Book a Call