When is AI worth building into a SaaS product?

AI is worth building when it shortens or eliminates a task that users currently do manually inside your product, and when that task sits on the critical path to value. The signal is users requesting it unprompted in support tickets and interviews, not sales teams losing deals. AI that collapses a 15-step manual process to one click — in the workflow users already live in — will get adopted. AI that requires users to learn a new prompt-based interaction on top of their existing workflow will not.

Why Most AI Features Fail (And What to Do Before You Build)

Q: What is the difference between a defensive AI feature and a value AI feature?

A defensive AI feature is one you need to exist — not to drive growth, but to prevent deal loss. Prospects ask about it. Sales is losing to competitors who have it. A basic version removes an objection. A value AI feature is something users would actively choose your product for — it accelerates the core job-to-be-done in a way that is measurable and sticky. Defensive features are table stakes. Value features are moats.

Q: How much does it cost to validate an AI feature before building it?

A structured validation sprint for a proposed AI feature — covering 20–40 user interviews, competitive analysis, and a basic prototype test — typically costs $3,000 to $8,000 in research time and tooling. The average AI feature costs $50,000 or more to build and ship. Validation at 6–16% of build cost is almost always worth it. The more significant ROI is avoiding the 3–4 months of engineering time redirected away from retention or activation work.

Q: What metrics should you track to measure AI feature success?

There are three tiers: Reach (what % of eligible users activated the feature at least once), Frequency (what % use it in a given week or month), and Impact (does feature usage correlate with improved retention, higher NPS, faster time-to-value, or expansion revenue). A feature with 40%+ weekly active users among those who tried it, with a statistically significant retention lift for frequent users, is generating genuine value. Single-digit reach with no measurable downstream impact means the feature is not working.

TL;DR

The Pattern: Competitor launches AI feature, sales panics, CEO mandates a pivot, engineering ships, nobody uses it.
The Math: An AI feature costs $50K+ to build. Customer research to validate it first costs ~$5K. The ROI on skipping validation is negative almost every time.
Three Categories: Defensive features (need a basic version to stop losing deals), Value features (drive retention, worth deep investment), Noise features (competitors have it, nobody uses it).
The Fix: Run a 4-week Validation Sprint before committing engineering time. 20–30 user interviews + a lightweight prototype test is enough to know.
Post-Launch Truth: Weekly active users among those who tried it — not launch-day signups — is the only metric that tells you if an AI feature is working.

1. The Story You've Already Lived

It starts in Slack.

#sales-general, 9:14 AM
"Heads up — just lost a call because the prospect asked if we have AI and I had to say no. Competitor demoed their 'AI Assistant' and it looked really good. This is the third time this month."
#sales-general, 9:31 AM
"Same. Lost 2 deals this week. Prospect said 'we want to go with a company that's investing in AI.' We need this."
#exec-team, 10:02 AM
CEO: "I'm seeing this thread. All hands Friday. We need an AI strategy."

Friday's all-hands produces a decision: pause the activation optimization work, shift engineering to AI. The PM raises a hesitant objection about timing and is overruled. The CEO has heard this objection before and doesn't find it persuasive when the competitive threat feels this immediate.

The team moves fast. They scope an AI assistant — something that helps users write, summarise, or automate the most common workflow in the product. It sounds reasonable. It maps loosely to what the competitor launched. Engineering estimates four months.

Four months later, the launch goes well. Product Hunt bump. LinkedIn post with a blue tick from the CEO: "Today we're launching [Product] AI — the first AI-native [category] platform." Existing users get an email. The announcement gets clicks.

Three months after that, the CEO asks for an update in the quarterly business review.

The PM opens their notes. Reaches for measured language.

"Adoption has been… modest. About 6% of users have tried it. Weekly active usage is around 1.2%."

The room is quiet for a moment.

"So nineteen out of twenty customers are not using it at all."

"Correct."

This is not a made-up scenario. Some version of this plays out constantly across Series A and B SaaS companies. The names change. The category changes. The outcome does not.

2. This Isn't One Company. It's Most Companies.

The AI hype cycle has created an unusual dynamic in B2B SaaS product teams. There is enormous pressure — from boards, from sales, from competitive intel — to ship AI features. And there is very little internal pressure to validate whether those features will actually be used.

The data on this is not encouraging. According to NTT DATA's 2024 enterprise AI research, between 70% and 85% of generative AI deployment efforts are failing to meet their desired ROI outcomes. S&P Global Market Intelligence's 2025 survey of over 1,000 enterprises found that 42% of companies abandoned most of their AI initiatives — up from 17% the year before. The average organisation scrapped 46% of AI proof-of-concepts before they reached production.

Even the AI features that do reach production face an adoption wall. Microsoft 365 Copilot — arguably the most distributed AI product in enterprise history, embedded into tools hundreds of millions of people use every day — has consistently struggled with meaningful utilisation. Recon Analytics tracked Copilot's accuracy Net Promoter Score deteriorating sharply through 2025. Gartner's research suggests 30% of generative AI projects will be abandoned after proof of concept by the end of 2025.

This is not an argument against AI. It is an argument against building AI features without understanding why users would change their behaviour to use them.

The question is never "should we have AI?" The question is "which specific workflow, for which specific users, does AI materially improve — and how do we know?"

Most teams skip that second question entirely. They jump from "competitor has AI" to "we need AI" without stopping to define what problem the AI is actually solving, whether that problem is real, and whether users would adopt a different interaction model to solve it.

The result is the pattern. Expensive build. Low adoption. Awkward QBR. Repeat.

3. The Math That Should End the Debate

Let's put real numbers on this.

A mid-complexity AI feature — something involving an LLM integration, a new UI surface, testing, and a phased rollout — costs a minimum of $50,000 to ship. That estimate is conservative. If you're redirecting senior engineers away from other roadmap work, the true opportunity cost is higher. SaaS development consultancies place the figure for a meaningful new feature module at $50,000 to $150,000 depending on complexity and team rates.

Four months of an engineering team's time at a Series B company, including product and design, is often $150,000 to $250,000 when you account for fully loaded costs. Plus the features you didn't build. The activation optimisation that got paused. The retention loop you were testing. The work that compounds.

Now consider what it costs to answer the question "will users actually use this?" before building it.

A structured validation sprint — 20 to 40 user interviews, a prototype or mockup test, competitive usage analysis — runs $3,000 to $8,000 in research time and tooling. If you hire a research partner or a fractional PM to run it, add another $3,000 to $5,000. Call it $5,000 to $10,000 all in.

10:1

The minimum cost ratio of building an AI feature to validating one. In practice, for teams redirecting senior engineering time, the ratio is closer to 20:1 or 30:1 — before you account for the opportunity cost of the work that didn't get done.

The counterargument is always speed. "We don't have time to run research. The competitor is already shipping." This argument proves too much. If the competitor has shipped an AI feature with 4% weekly active users, you haven't actually lost anything by not shipping the same feature faster. You've lost a deal or two to a demo. That is a different problem — one with a cheaper solution than a four-month engineering sprint.

Speed matters when you know you're building the right thing. Validation is what tells you whether you are.

4. The Framework: Defensive vs. Value vs. Noise

Not all AI features are equal. The mistake is treating them as if they are. There are three categories, and the decision of how much to invest — and when — is entirely different for each.

Defensive

Table Stakes

Exists to stop losing deals. A basic version is enough. Deep investment is wasted here — users won't use it more because it's better.

Value

Retention Driver

Directly improves the core job users hired your product for. Sticky. Measurable. Worth significant investment and iteration.

Noise

Build Nothing

Competitor has it, but nobody actually uses it. Building it consumes resources for zero retention or revenue impact.

Defensive Features

A defensive feature is one you need to exist — not to drive growth, but to stop losing deals to a competitor who has it. The signal is specific: sales is losing qualified opportunities where the decision criteria explicitly includes this capability. Not "prospects asked about AI generally." Not "we heard someone mention it." Lost deals where the capability was a stated requirement.

The right investment in a defensive feature is the minimum viable version that removes the objection. Not the best AI assistant in the market. Not a full NLP pipeline with custom model fine-tuning. A feature that gets a sales rep to a "yes" on the capability question in a demo.

Where teams go wrong with defensive features is over-engineering them. They treat "we need to have this" as "we need to have the best version of this." The first is a sales problem. The second is a product strategy problem. They are not the same.

If your competitor feature matrix shows a competitor has AI summarisation and you're losing deals because of it, build AI summarisation at V1 quality, ship it, and move on. Don't spend three months making it exceptional. You will not win on that feature.

Value Features

A value feature is different in kind. It directly accelerates the core job-to-be-done that your users hired your product for. The signal here comes from users, not sales — unprompted requests in support tickets, research interviews where users describe doing the job manually and wishing the product handled it, behavioural data showing users repeatedly reaching for a workflow your product doesn't support.

This is where deep AI investment pays off. When AI collapses a task that users do manually inside your product — from fifteen steps to one, or from twenty minutes to two — they adopt it. Not because it's AI. Because it saves them real time on a real task they were already doing.

The best AI features in the market right now share a characteristic: they fit inside an existing workflow rather than requiring users to learn a new one. GitHub Copilot autocomplete works because it lives in the editor where developers already are, suggesting the next line of code at the moment they need it. It doesn't ask users to go somewhere new, learn a new prompt format, or change their process. It makes the existing process faster.

That is the design question for value AI features. Not "what could AI do in our product?" but "where in the workflow users already live does AI eliminate a step they currently do manually?"

Understanding this requires the kind of jobs-to-be-done research that maps what users are actually doing inside your product — not what you think they should be doing, but what they are doing, step by step, before and after your product's core value moment.

Noise Features

This is the hardest category for leadership teams to accept. Some AI features that competitors have built, and that look impressive in demos, are not being used by anyone. Including the competitor's customers.

The signal for noise is in the data. If a competitor has had a feature live for 12 months and it never appears in their product marketing as a retention driver, never shows up in case studies about customer outcomes, never gets mentioned in their expansion sales motion — that feature is almost certainly not driving retention. It might be winning a few deals in competitive situations. But it is not a moat.

Building in response to noise is purely defensive spend at best, and wasted roadmap capacity at worst. The problem-AI fit test exists for exactly this scenario — to force an honest answer to the question "does the problem this AI solves actually exist for our users, at a frequency and severity that would change their behaviour?"

If the answer is no, the right decision is to build nothing and redirect the capacity to work that compounds.

Signal	Category	Right Investment
Sales losing deals on this specific capability	Defensive	Minimum viable version. Ship fast. Stop iterating.
Users requesting it unprompted in support + research	Value	Deep investment. Design for workflow fit. Measure retention impact.
Competitor has it but you can't find users who care	Noise	Build nothing. Redirect capacity to activation or retention.
"Everyone's talking about AI" with no specific use case	Noise	Run validation first before any scope commitment.
Users doing this task manually inside your product today	Value	High confidence. Prototype, test, build.

5. The Validation Sprint: How to Answer the Question for $5K

Before any AI feature goes on a roadmap as a committed build, it should go through a validation sprint. This is not a committee. It is not a survey. It is four weeks of structured research designed to answer one question: will users change their behaviour to use this?

Week 1: Define the Hypothesis

Write the feature down in terms of the job it does, not the technology it uses. Not "AI assistant" — that is a solution. Write: "Users currently spend X minutes doing Y manually in our product. AI would eliminate that step. Users would use it if it appeared at Z moment in their workflow."

If you cannot fill in X, Y, and Z with specifics pulled from actual user data or research, you are not ready to run a validation sprint. You are ready to run exploratory interviews to find out what X, Y, and Z are.

This hypothesis becomes your research brief.

Week 2: Recruit and Interview

Twenty to thirty interviews. Target your active, retained users — not churned users, not trial users, not people who signed up and never came back. You want users who are doing the job your product is built for.

The interview structure has three parts:

Current workflow walkthrough: "Walk me through the last time you did [task]. Show me what you actually did, step by step."
Pain mapping: "What part of that process takes the most time? Where do you wish the product just did it for you?"
Concept reaction: Show a prototype or mockup of the AI feature. "If this existed, how would you use it? Where in your process would it fit?"

Listen for unprompted enthusiasm. Users who tell you the problem exists before you describe the solution are high-signal. Users who respond politely when you pitch the feature and say it sounds useful are low-signal — that is social courtesy, not purchase intent.

Week 3: Prototype and Test

Build a non-functional prototype — a Figma mockup or a Wizard of Oz test where a human simulates the AI response — and put it in front of ten users in a moderated session. Do not ask "do you like this?" Ask "try to use this." Watch where they hesitate. Watch what they expect the feature to do that it doesn't. Watch whether they reach for it naturally or ignore it.

A prototype test reveals workflow fit problems that no interview can surface. Users can tell you they want something in an interview and then fail to use it in a prototype test because the interaction model is wrong. That failure is valuable. It costs you a few days of design time, not four months of engineering.

Week 4: Categorise and Decide

At the end of four weeks you have enough signal to categorise the feature. You can answer:

Is this a real problem that real users have, at a frequency that would drive behaviour change?
Does the AI solution fit naturally into the workflow users already live in?
Is this a defensive, value, or noise feature based on the signals we found?
If value: what is the right V1 scope to test the retention hypothesis?
If defensive: what is the minimum version that removes the sales objection?
If noise: what should we build instead?

4 weeks

The time a validation sprint takes. Four weeks of research versus four months of engineering. The sprint doesn't guarantee a successful feature — but it dramatically increases the probability of one, and it surfaces the reasons a feature will fail before you've sunk the cost of building it.

The biggest resistance to running validation sprints is the belief that research takes too long. In reality, a properly resourced sprint with a dedicated researcher or a fractional PM who knows how to run it takes four weeks. Engineering for an AI feature takes four months minimum. Research is not the bottleneck. It is the insurance.

6. When AI Is Worth Building

This post is not anti-AI. It is anti-building-AI-because-a-competitor-launched-something-and-sales-panicked. Those are not the same thing.

There are genuine, high-ROI AI applications in B2B SaaS products. The ones that work share three characteristics.

They eliminate a task users currently do manually inside the product

The best AI features are not new capabilities. They are automated versions of existing manual steps. If your users are copying data between fields, writing the same type of content repeatedly, or running the same analysis workflow every week, AI that automates those tasks will get adopted. The user already knows the job needs to be done. You are making it faster. There is no behaviour change required — just a time saving.

The hardest AI features to get adopted are ones that require users to invent a new workflow. Blank-slate AI chatbots embedded into B2B products almost universally underperform. Users don't know what to ask. The feature sits in the UI, getting ignored, because there is no moment in the existing workflow where it naturally appears.

The trigger is in the workflow, not on a separate page

AI features that require users to navigate to a dedicated "AI" section of the product have structurally low ceilings. Users need to remember the feature exists, decide it's relevant, navigate to it, and learn how to use it. That is four steps between intent and value.

AI that surfaces at the moment users need it — a suggested completion as they type, an automated analysis that appears when they open a report, a proactive alert triggered by a data pattern — requires zero navigation. The user encounters the value without seeking it.

If your proposed AI feature lives behind a menu item or a dedicated section, that is a design problem before it is an engineering problem. Fix the placement before you build the model.

You can measure a meaningful outcome that is not "used the feature"

AI features succeed when they drive a downstream outcome you can measure. Not "activated the AI widget." Something like: users who use AI summarisation produce 40% more reports per session. Users who accept AI suggestions complete the workflow in 8 minutes instead of 22. Teams using the AI-assisted onboarding flow reach the first value milestone 3 days faster.

If you cannot articulate what downstream outcome the AI feature should drive — and instrument your analytics to measure it — you do not have a success criterion. You have a launch event. Launch events feel like wins for about three months, which is roughly when the QBR happens and someone asks about adoption.

This is the analytics-to-action pipeline problem. Shipping without defining what success looks like in the data means you will not know you have failed until it is too late to fix cheaply.

"The signal for a value AI feature is not 'the competitor has this.' It's 'users are doing this manually right now, inside our product, and they've told us they wish it was automatic.' That's the only signal worth building for."
— Jake McMahon, ProductQuant

7. Post-Launch: Measuring Whether an AI Feature Is Actually Working

Most teams measure AI feature launch with the wrong metrics. Launch-day signups. Total activations. A percentage of the user base that clicked into the feature once. These numbers feel positive and mean almost nothing.

The metric that tells you whether an AI feature is working is weekly active users among those who have tried it — the retention rate of the feature itself. Not "how many people tried it." How many kept coming back.

If 500 users try your AI assistant in the first month and 12 of them are using it weekly by month three, that is a feature with a 2.4% retention rate. That is not an adoption problem. That is a signal that the feature does not deliver enough value for users to form a habit around it.

Here is the measurement framework that tells the real story:

Tier 1: Reach

What percentage of eligible users have activated the feature at least once? Eligible means users who have the feature available and have been active in the product during the measurement period. If reach is below 20% after 90 days, the feature is not visible enough, or users are not encountering a moment where it feels relevant.

Tier 2: Frequency

Among users who tried the feature at least once, what percentage used it again in the following two weeks? And the two weeks after that? A feature with 30% reach but 60% two-week retention among tryers is performing well. A feature with 30% reach and 8% two-week retention among tryers has a product-market fit problem at the feature level.

Tier 3: Impact

Does frequent AI feature usage correlate with the downstream outcomes you care about? Retention. NRR. Time-to-value. Expansion revenue. If there is no statistically meaningful correlation between AI feature usage and any of the metrics that matter for your business, the feature is not doing what you hoped it would do. It might be pleasant. It might get mentioned in NPS comments. It is not a moat.

Reach → Frequency → Impact

The three-tier AI feature health check. A feature that fails at Reach has a discoverability or relevance problem. A feature that passes Reach but fails Frequency has a value delivery problem. A feature that passes both but fails Impact is not connected to your core business model. Each failure mode has a different fix — and the wrong diagnosis leads to the wrong intervention.

What to do when adoption is low

Before concluding the feature is the wrong idea, run the diagnostic. Low reach is usually a placement or trigger problem — the feature exists but users never encounter a moment where it feels relevant. This is fixable without rebuilding.

Low frequency among tryers is harder. It usually means the value delivery on first use wasn't strong enough to create a return visit. This is a quality and workflow fit problem. The AI output wasn't good enough, or the interaction model required too much effort, or the task the AI assists with isn't actually frequent enough to build a habit around.

If you have diagnosed placement, improved quality, and run a targeted re-engagement campaign — and frequency still does not move — you are probably dealing with a noise feature that survived the build process. The right response is to stop investing in it, document what you learned, and apply that learning to the next validation sprint.

8. FAQ

Why do AI features have such low adoption rates?

Most AI features are built in response to competitive pressure rather than validated user need. The feature solves a problem the team assumed users have, not a problem users have articulated. When AI is bolted onto an existing workflow rather than embedded within it, users default back to their existing process. Low adoption is the product telling you it wasn't needed in the form you built it.

What is the difference between a defensive AI feature and a value AI feature?

A defensive feature exists to remove a sales objection. A basic version is all you need — and over-engineering it is wasted investment. A value feature directly improves the core job users hired your product for. It is sticky, measurable in downstream outcomes, and worth significant iteration. The decision of how much to invest is entirely different for each category, which is why conflating them is expensive.

How much does it cost to validate an AI feature before building?

A structured validation sprint — 20 to 40 user interviews, prototype testing, competitive analysis — typically costs $3,000 to $8,000 in research time and tooling. The average AI feature costs $50,000 or more to build. Running validation at 6% to 16% of the build cost is almost always worthwhile. The more significant ROI is avoiding the redirection of three to four months of engineering time away from work that was already compounding on your retention metrics.

What metrics should you track to measure AI feature success?

Three tiers: Reach (what percentage of eligible users activated the feature at least once), Frequency (what percentage of those who tried it use it weekly), and Impact (does feature usage correlate with improved retention, NRR, time-to-value, or expansion revenue). A feature that clears all three tiers is working. A feature that fails at any tier has a specific, diagnosable problem — discoverability, value delivery, or product-model fit.

When is AI actually worth building into a SaaS product?

When it eliminates a task users currently do manually inside your product, when the trigger surfaces in the existing workflow rather than behind a navigation item, and when you can define a measurable downstream outcome it should drive. The clearest signal: users requesting it unprompted in support tickets and research interviews, describing doing the job manually and wishing the product handled it. That is real demand. Competitive announcements and sales team pressure are not the same thing.

Review your AI strategy

We've audited AI adoption in 50+ products. Let's assess yours.

Talk to Jake

About the Author

Jake McMahon is a product growth consultant specialising in Series A and B SaaS companies. He has led product research, roadmap validation, and PLG implementations for B2B platforms in healthcare, HR, and productivity software. His work focuses on connecting product decisions to revenue outcomes.

Learn More Work with Jake

PLG readiness

Score whether your product is structurally ready to make AI features land.

AI features fail when the activation path, self-serve mechanics, and expansion logic aren't in place. The PLG Scorecard diagnoses exactly which layers of your growth system are holding new features back from delivering value.

Get the PLG Scorecard →