SaaS Niche Identification: Using Public Databases for TAM/SAM/SOM

Q: Does this method work for B2B SaaS outside regulated industries?

Yes, with a different data source. Regulated industries have government registries because licensure requires registration. B2B SaaS outside those industries can use Census Bureau NAICS data (firm counts by industry, employee size, and state), LinkedIn firmographic filtering, or industry association membership datasets. The methodology is the same — write a testable segment definition, count directly from the primary source, cross-check against one independent estimate. The confidence level is typically lower than a government registry count (because these sources are not complete-by-mandate), but it is still significantly better than an analyst report estimate that cannot be interrogated for its definitions.

Q: How do I tag my existing customers with segment criteria to run the internal cross-reference?

Start with what you already have. Your CRM likely contains company name, company size (employees or revenue), and industry. For healthcare SaaS, company size maps roughly to practice size; if you have NPI numbers for some customers, you can cross-reference directly against NPPES. For most B2B SaaS teams without systematic firmographic tagging, a practical starting point is manually tagging your top 50–100 accounts using the segment definition, then running retention, expansion, and deal size comparisons on that tagged subset. Imperfect tagging of a sample still produces directional signal — and that is enough to determine whether the double-confirmation hypothesis is worth pursuing with full firmographic enrichment.

Q: What if the registry count is much smaller than we expected?

That is the finding, not a problem to solve. A smaller-than-expected count usually means one of three things: the segment definition was wider than the real buyer population (definitional issue), the market is genuinely smaller than prior estimates suggested (market reality), or the registry is incomplete for this segment (methodology issue — worth checking against a second source before concluding). In the healthcare SaaS engagement described in this article, the multi-location segment count came in at 15,047 versus a prior estimate of 30,000–50,000. That was not a failure of the analysis — it was the analysis working correctly. A smaller addressable market with a higher win rate and higher ACV can be a better business than a larger market with diluted focus. The registry count lets you make that comparison on real numbers.

TL;DR

Analyst report market sizing estimates for the same segment can vary by 3–4× depending on how the source defines the segment. Primary administrative databases resolve this by letting you count directly.
For regulated industries, government registries contain a near-complete record of every licensed provider or organisation. Healthcare uses NPPES (9.4M NPI records). Legal uses state bar directories. Dental uses dental board registries.
Counting from a government registry exposed a 2–3× discrepancy in prior estimates for one engagement: the "multi-location practice" segment was estimated at 30,000–50,000 organisations. The registry count was 15,047. The discrepancy came from a definitional difference — not a data quality problem.
The highest-value signal is a double confirmation: the registry identifies a niche as the largest opportunity by revenue-per-organisation, and your internal product data independently shows the same niche over-indexing in retention and deal size. No single source of error explains both findings.
B2B SaaS teams outside regulated industries can apply the same method using NAICS company-count data or LinkedIn firmographic filtering for bottom-up segment sizing.

When a B2B SaaS team needs to decide which niche to focus on, the standard approach is to find an analyst report, read the TAM, and argue about whether the cited figure is realistic.

The problem is structural. Analyst reports define segments inconsistently. One report says "multi-location medical practice" means any group with two or more physicians. Another says it means two or more physical locations. A third uses revenue thresholds. The same segment — described in the same language — produces estimates that vary by 3–4× depending on which source you cite.

The argument that follows is not a data problem. It is a definitions problem. And you cannot resolve a definitions problem with more analyst reports.

The correct approach for regulated markets is to skip the analyst reports and count directly from the authoritative source. In most regulated industries, a government or professional body maintains a registry of every licensed practitioner or organisation. That registry was not designed for market research — it was designed for administrative compliance. But for sizing a segment, administrative completeness is exactly what you need.

What Public Databases Exist and What They Contain

The relevant registries by industry:

Industry	Primary database	What you can filter on
Healthcare	CMS NPPES (National Plan and Provider Enumeration System) — 9.4M NPI records, updated monthly, publicly available at cms.gov	Entity type (individual vs. organisation), speciality taxonomy code, number of registered practice addresses, state, practice type, zip code
Legal	State bar association directories (each state maintains its own; some are searchable, some are downloadable)	Bar number, firm size, practice area, county/state, admission year
Dental	State dental board licensee registries; NPI also covers dental practices (dentists have their own taxonomy codes)	Licence status, speciality, practice address, entity type
Real estate	MLS (Multiple Listing Service) datasets; NAR membership data; state real estate commission registries	Brokerage size, agent count, transaction volume, geographic market
General B2B	US Census Bureau NAICS data (Census of Business, BDS, CBP) — firm counts by industry code, employee size, state	NAICS code, firm size (employees), state, establishment count, annual revenue range

The common property across all of these: they were not designed to be sampled. They were designed to be complete. That is what makes them useful for market sizing. When a registry contains every licensed dentist in the US because every licensed dentist is legally required to register, you are not dealing with survey response bias or sampling error. You are dealing with a census.

Step 1 — Define the Segment Precisely Before You Count

The most important step is also the one most teams skip: writing an explicit, testable definition of the segment before opening the database.

This matters because the segment definition determines the count — and if you reverse-engineer the definition from the count you wanted, you have not learned anything. You have confirmed a prior belief by choosing the filter that produces the number you started with.

A good segment definition for registry counting has three properties:

It uses criteria that exist in the registry. "Multi-location practice" is testable — NPPES has registered practice addresses per NPI. "Growth-oriented practice" is not testable — NPPES has no field for growth orientation.
It reflects a real buyer characteristic. The definition should describe the organisational property that makes this segment a distinct buyer — distinct in their problem, their purchase process, or their expected ACV. Not just a convenient filter.
It is written before running the query. Decide the definition with the team before running the numbers. Document it. If the count comes back surprising, revise the definition — but document the revision and the reason, rather than quietly adjusting until you get the number you wanted.

A real example of why this matters: In a healthcare SaaS market sizing engagement, the prior definition of "multi-location practice" was "any physician group." NPPES query on that definition produced a count in the 30,000–50,000 range. When the definition was tightened to "physician organisations with 2 or more registered practice locations" — the property that actually defined the segment's operational problem — the count dropped to 15,047. The 2–3× discrepancy was not a data error. It was a definitional error. Fixing the definition fixed the estimate.

Step 2 — Run the Count and Cross-Check It

Once you have a written definition, query the registry. For NPPES specifically:

Download the monthly data file from the CMS NPPES download page (approximately 8GB, CSV format)
Filter by Entity Type Code 2 for organisations (not individuals)
Apply your specialty taxonomy filter if relevant (each specialty has a standardised taxonomy code in NPPES)
Count unique NPIs with 2 or more unique practice location addresses in the file
Break down by location count to understand the distribution (2-location, 3-location, 4–9 location, etc.)

After running the count, cross-check it against at least one independent source. For healthcare, the AMA Physician Practice Benchmark Survey provides periodic estimates of group practice counts by size. NPPES counts should fall within or near the range the AMA produces — if they diverge significantly, investigate the definition before proceeding.

In the multi-specialty group analysis for a healthcare SaaS engagement:

NPPES count: 28,042–32,392 practice addresses with 4+ physicians and 3+ distinct specialty taxonomy groups
AMA estimate for the same segment: 25,000–35,000
NPPES count falls squarely within the AMA range — high confidence, definition confirmed

That confirmation is not incidental. When two independent sources — one a government administrative registry, one a professional association survey — produce results that agree, the segment definition is likely correct and the count is likely sound.

Step 3 — Rank Segments by Revenue Opportunity, Not Just Size

A large segment count does not mean a large revenue opportunity. A segment of 150,000 small practices at $1,500 ACV represents a different opportunity than a segment of 30,000 multi-specialty groups at $12,000 ACV — even though the small practice segment is 5× larger by count.

The ranking exercise that produces actionable prioritisation:

Segment	Registry count	Estimated ACV	SAM × ACV	Confidence
Multi-Specialty Groups (4+ physicians, 3+ specialties)	28,042–32,392	$12,000	$216M	High — NPPES + AMA cross-validated
Multi-Location Groups (2–9 locations)	15,047	$9,000	$74.5M	High — NPPES definitive count
Small Practices (solo to 5 physicians)	~150,000	$1,500	$138M	Medium — AMA estimate, ARPU from sales data

The multi-specialty group segment ranks first by revenue opportunity even though it is smaller by count than the small practices segment. That ranking is only possible when you have precise counts and validated ARPU — both of which require primary data sources, not analyst estimates.

Step 4 — Cross-Reference With Your Internal Product Data

Market sizing tells you which segments are large and addressable. It does not tell you which ones the product already works for. That signal comes from your internal data — and it is the one that converts a market sizing exercise into a strategic niche decision.

The question to answer from your internal product data:

Which segments over-index in activation rate? Which reach first value faster?
Which segments show the highest 90-day and 12-month retention?
Which segments expand — move to higher tiers or add seats — at the highest rate?
Which segments have the highest average deal size among your current customer base?

If you can tag your current customers with the segment definition you used in the registry analysis — even approximately, using firmographic enrichment or the data already in your CRM — you can calculate these metrics by segment and compare them to the market sizing output.

"The segment ranking that matters is not the one with the biggest TAM. It is the one where the market says the opportunity is large and the product says it already works — independently, from sources that share no methodology."
— Jake McMahon, ProductQuant

In the healthcare SaaS engagement described in this article, this cross-reference produced a double-confirmation signal on one specific niche: the registry analysis ranked it as the highest opportunity by revenue-per-addressable-organisation, and internal product data showed the same segment over-indexing in retention, deal size, and expansion revenue. Those two signals came from a government database and a product analytics platform — completely different sources, completely different methodologies, completely different potential failure modes.

When they agree, the conclusion is defensible in a way that neither source alone could produce. And that defensibility matters — for board presentations, for investor conversations, and for the internal alignment that allows a team to commit resources to a niche without relitigating the decision every quarter.

The Niche Validation Framework

The five questions that structure the full analysis:

Niche Validation — Five Questions

What administrative database exists for your vertical? Healthcare: NPPES. Legal: state bar directories. Dental: dental board registries. General B2B: Census NAICS data. If no government registry exists, what is the most authoritative industry association dataset?
How many organisations meet your segment definition — using criteria that exist in the registry? Write the definition before running the query. Document it. Cross-check the count against one independent source.
What is the revenue-per-organisation for each segment? Use your own closed deal data where possible. Use comparable vendor pricing pages as a cross-check. Apply the "minimum" and "stretch" framing — not a single number, a range with a stated basis for each end.
What percentage of your current customers come from each segment? Tag your existing customers with segment criteria. Even approximate tagging from CRM data produces a useful signal. Look for segments that are over-represented in your customer base relative to their share of the total market.
Where do the market data and internal data agree? A segment that ranks high on revenue opportunity in the registry analysis and over-indexes in your retention or expansion data is your niche signal. One source alone is a hypothesis. Both sources agreeing is a defensible conclusion.

The compound research stack that produced this methodology

The niche identification method described here is Step 4 in a five-method research stack that also includes JTBD interviews, sales call statistical analysis, and Kano survey. The full article covers all five methods and the cross-validation that connects them.

Read: The Compound Research Stack

FAQ

Does this method work for B2B SaaS outside regulated industries?

Yes, with a different data source. Regulated industries have government registries because licensure requires registration. B2B SaaS outside those industries can use Census Bureau NAICS data (firm counts by industry, employee size, and state), LinkedIn firmographic filtering, or industry association membership datasets. The methodology is the same — write a testable segment definition, count directly from the primary source, cross-check against one independent estimate. The confidence level is typically lower than a government registry count (because these sources are not complete-by-mandate), but it is still significantly better than an analyst report estimate that cannot be interrogated for its definitions.

What is NPPES and who maintains it?

NPPES (National Plan and Provider Enumeration System) is a federal database maintained by the Centers for Medicare and Medicaid Services (CMS). It assigns a National Provider Identifier (NPI) to every healthcare provider and organisation that bills Medicare or Medicaid, which in practice covers virtually all healthcare providers in the US. As of the February 2026 dataset, it contains 9.4 million NPI records. The full dataset is publicly available at no cost on the CMS website and is updated monthly. For healthcare SaaS market research, it is the most authoritative source available for segment counting by organisation type, speciality, and location count.

How do I tag my existing customers with segment criteria to run the internal cross-reference?

Start with what you already have. Your CRM likely contains company name, company size (employees or revenue), and industry. For healthcare SaaS, company size maps roughly to practice size; if you have NPI numbers for some customers, you can cross-reference directly against NPPES. For most B2B SaaS teams without systematic firmographic tagging, a practical starting point is manually tagging your top 50–100 accounts using the segment definition, then running retention, expansion, and deal size comparisons on that tagged subset. Imperfect tagging of a sample still produces directional signal — and that is enough to determine whether the double-confirmation hypothesis is worth pursuing with full firmographic enrichment.

What if the registry count is much smaller than we expected?

That is the finding, not a problem to solve. A smaller-than-expected count usually means one of three things: the segment definition was wider than the real buyer population (definitional issue), the market is genuinely smaller than prior estimates suggested (market reality), or the registry is incomplete for this segment (methodology issue — worth checking against a second source before concluding). In the healthcare SaaS engagement described in this article, the multi-location segment count came in at 15,047 versus a prior estimate of 30,000–50,000. That was not a failure of the analysis — it was the analysis working correctly. A smaller addressable market with a higher win rate and higher ACV can be a better business than a larger market with diluted focus. The registry count lets you make that comparison on real numbers.

Sources

CMS NPPES — National Plan and Provider Enumeration System data files (9.4M NPI records, updated monthly)
AMA 2024 Physician Practice Benchmark Survey — independent validation of healthcare practice segment counts by size and type
US Census Bureau County Business Patterns 2023 — firm counts by NAICS code and employee size for non-regulated industries
ProductQuant's TAM/SAM/SOM analysis for a HIPAA-compliant healthcare SaaS platform (January–February 2026, v5.0, 3 rounds of fact-checking). Figures cited: 15,047 physician organisation NPIs with 2–9 registered practice locations (NPPES Feb 2026 — high confidence, definitive count); 28,042–32,392 practice addresses with 4+ physicians and 3+ distinct specialty taxonomy groups (NPPES Feb 2026, cross-validated against AMA 2024 estimate of 25,000–35,000); prior estimates for multi-location segment (30,000–50,000) revised to 15,047 after definitional correction. Total primary market TAM: $1.41B across 6 segments.

About the Author

Jake McMahon is a product growth strategist and founder of ProductQuant, working with B2B SaaS companies to build the research and analytics infrastructure that makes strategic decisions defensible.

The market sizing methodology described in this article is part of ProductQuant's compound research stack — a five-method approach that cross-validates JTBD interviews, sales call analysis, Kano survey, primary market sizing, and internal usage data to produce research findings that hold up under scrutiny. See the full methodology in the Compound Research Stack article.

About Jake The DISCOVER Framework

Next Step

Replace gut instinct with a defensible niche decision

The niche identification methodology described here is the market sizing phase of ProductQuant's DISCOVER framework — one of five research methods run simultaneously to produce cross-validated findings that hold up under board, investor, and team scrutiny.

Explore the DISCOVER Framework Book a Call