Shipping an LLM product: pricing, margins, and the anti-moat trap
LLM-powered products face economics that are harder than traditional SaaS. Variable costs that scale with usage, margins squeezed by inference, commoditization risk, and competitors with the same foundation models. How to build a product that's actually defensible — and the patterns that lead to LLM
In 2023 and 2024, a wave of LLM-powered startups launched. They wrapped GPT-4 or Claude in a UI, served a specific use case, charged subscriptions. Many got traction. Many died.
The deaths weren't because the products were bad. They were because the products had no moat. Anyone could build the same thing with the same underlying model. Customers churned to whoever charged less. Founders watched margins shrink as the foundation model providers cut their own prices or launched competing features.
By 2026, the lessons are clearer. Building an LLM-powered product is not the same as building a traditional SaaS product. The economics are different. The competitive dynamics are different. The moats are different. Teams that don't understand the differences end up as case studies in "AI startups that disappeared."
This article covers what shipping an LLM product actually looks like in 2026 — the pricing, the margins, the strategic patterns, and the moats that hold up versus the ones that don't.
What's different about LLM products
A few characteristics that distinguish LLM-powered products from traditional SaaS:
Variable costs scale with usage. Each user interaction has a per-call cost (inference). Unlike software where serving the 10,000th user is essentially free, the 10,000th LLM call costs the same as the first.
Margins are tighter. Inference costs are 10-40% of revenue in many cases. After hosting, compute, and other infrastructure, gross margins for AI products are typically 50-75%, not the 80-90% of pure SaaS.
Foundation models are commodities. GPT, Claude, Gemini are available to everyone. Your competitor uses the same models. Whatever capability comes from the model is not differentiated.
Foundation models change. New versions, deprecations, price changes happen quarterly. Your product depends on infrastructure you don't control.
Quality is hard to defend. A clever prompt is easily reverse-engineered. A unique fine-tune can be replicated. A workflow innovation can be copied.
Customer expectations shift fast. What's impressive today is table stakes in 6 months. The bar moves quickly.
Pricing pressure. Foundation model providers themselves launch products competing with their customers. Salesforce + OpenAI; Microsoft + OpenAI; competitors with massive distribution.
These aren't theoretical. They're the daily reality of building in 2026.
The cost structure
A typical LLM product's cost structure:
Variable costs (per user / per usage):
- Inference (LLM API or hosting).
- Embedding (for RAG).
- Vector DB or storage.
- Other infrastructure that scales with usage.
Fixed costs:
- Team salaries.
- Office/operational.
- Software licenses (CRM, tools).
- Hosting baseline.
- Marketing/sales.
The variable cost is what's different. If your variable cost is €0.05/active-user/day and you charge €20/month, you have €19.50 - days_active × €0.05 of contribution. Heavy users can erode this significantly.
A real example. A company offering an AI writing tool at €20/user/month:
- Average user: 5 sessions/day × 20 days/month = 100 sessions/month.
- Average session: 5 LLM calls × 2K tokens average.
- Total tokens: 1M tokens/month/user (input + output).
- Cost at Claude Sonnet pricing (€3/M input, €15/M output): ~€7-12/user/month.
- Gross margin: ~50-65%.
For their heavy users (10-20x more sessions):
- 10-20M tokens/month.
- Inference cost: €70-200/month.
- Loss per heavy user.
Without usage-based controls, heavy users can be unprofitable. This is a problem traditional SaaS doesn't have at the same magnitude.
Pricing models
The pricing question is harder than for traditional SaaS. Options:
Flat per-user. Simple. The default. But heavy users can be unprofitable; light users subsidize them.
Per-seat with usage caps. Each seat includes N actions/calls/tokens per month. Heavy users pay overage or hit limits.
Pure usage-based. Pay per call, per token, per action. Aligns cost and revenue. Predictable margins. But uncertain bills can hurt adoption.
Tiered. Free tier, pro tier (with limits), enterprise (custom). The standard SaaS pattern, adapted for usage limits.
Hybrid. Per-seat baseline plus usage-based for power features. Captures both predictability and alignment.
Outcome-based. Pay per outcome (closed deal, completed ticket, generated content piece). High-value, hard to operationalize. Becoming more common as AI does more measurable work.
Each has trade-offs. The "right" model depends on:
- Predictability needs (yours and the customer's).
- Variance in usage.
- Margin structure.
- Competitive landscape.
Most successful LLM products use tiered or hybrid. Heavy users hit limits or pay overage; light users get reasonable value.
The unit economics question
A useful frame: what's the unit you charge by, and what does it cost?
For a chat assistant:
- Unit: a "conversation."
- Cost: €0.10-1.00 per conversation depending on length.
- Revenue: ?
If you charge €20/user/month and users have 100 conversations/month, average revenue per conversation is €0.20. Tight margin or loss.
If you charge €30/user/month with a limit of 200 conversations and most users have 50: comfortable margin.
The exercise: estimate cost per unit, set revenue per unit, ensure margin makes sense across distribution of users.
Heavy users are usually the loss-makers. Decide explicitly: subsidize them? Force them to enterprise? Cap usage?
Margin protection
A few tactics for protecting margins:
1. Tier the model by feature
Cheap-to-run features available on basic tier. Expensive features (reasoning models, long context, large outputs) on premium.
A free-tier user using Claude Haiku-equivalent costs maybe €0.10/month in inference. A premium-tier user using Claude Opus-equivalent might cost €5-20/month. Tier accordingly.
2. Cost optimization (covered in another article)
Caching, routing, output control. Production-grade cost optimization typically saves 60-90%. This is the difference between profitable and unprofitable for many products.
3. Usage transparency
Show users their usage. Implicit guidance for them to optimize their own behavior.
This isn't punitive; it's mutually informative. Users like seeing what they're using. Heavy users self-throttle or upgrade.
4. Smart caching
User-specific or organization-specific caching dramatically reduces costs for power users. They keep using your product more than they'd be punished by usage limits.
5. Hybrid hosted/self-hosted
For large enterprise customers, offering self-hosted or BYO-cloud variants shifts inference cost to them. They get privacy/control; you get cost relief.
6. Outcome-based for high-value
Some workflows produce measurable outcomes (deals closed, tickets resolved). Pricing by outcome captures value differently than per-token. Margins can be much better.
The moat question
The harder question. What makes your product defensible?
A list of "moats" that are actually moats versus pseudo-moats:
Real moats
Proprietary data. You have data competitors can't easily obtain. Customer-specific data is the canonical example. The more your product learns from each customer's data, the harder to switch.
Distribution. You're embedded in workflows or platforms competitors aren't. Microsoft Copilot has a massive distribution moat through Office.
Trust. Customers in regulated industries, sensitive verticals trust you. Switching is expensive (security review, compliance review, training).
Integration. You're integrated with many of the customer's tools, replacing or augmenting them. Each integration is a switching cost.
Workflow specialization. You've built specifically for a workflow that takes deep domain knowledge. Generic AI tools can't replicate without similar investment.
Network effects. Users benefit from other users (community, shared content, comparative benchmarks). Rare for individual-productivity AI but real for some categories.
Brand and switching costs. For enterprise sales, the buyer's career is tied to the choice. Switching is risky for them; you have a defensible position once chosen.
Vertical integration. You own meaningful infrastructure (own models, own inference, own data pipelines). Hard to replicate.
Pseudo-moats
A specific prompt or workflow. Easily reverse-engineered. Customers can build it themselves.
A specific model choice. Available to competitors.
A clever UI. Copyable in a quarter.
Speed of iteration. Real for a window; competitors catch up. Not durable alone.
Marketing or branding. Disposable in the AI era; reputations made and unmade quickly.
The pseudo-moats are what most AI wrappers rely on. They explain the high mortality rate.
The strategic patterns
A few patterns that produce defensible LLM products:
Pattern 1: Workflow + AI, not "AI tool"
Rather than "AI that does X," build a workflow that incorporates AI as part of a larger system.
Example: not "AI summarizer for legal contracts," but "contract management platform with built-in AI."
The platform value (storage, organization, collaboration, history, compliance) is moat; the AI is a feature.
Pure AI tools commoditize. Platforms with AI features hold up.
Pattern 2: Customer-data flywheel
Each customer's usage produces data that improves their experience (and possibly others'). Switching means losing the accumulated personalization.
Example: an AI sales assistant that learns each user's voice, accounts, and patterns. After 6 months, switching to a competitor means starting over.
Build this flywheel from day one. Make the product better the longer you use it.
Pattern 3: Deep vertical specialization
Pick a vertical. Build for it deeply. Healthcare, legal, finance, real estate.
The vertical knowledge is your moat. Generic AI tools can't compete on domain specificity. Specialized competitors can — but each vertical is its own market with room for a few specialists.
Pattern 4: Embed in existing workflow
Don't ask users to come to your product. Embed in where they already work — Slack, Microsoft 365, Google Workspace, their CRM.
Distribution through embedding is a real moat. Once you're in their daily tools, ripping you out is expensive.
Pattern 5: AI-native operations
Some businesses are AI-native end-to-end — they don't sell AI to humans; they use AI to deliver a service. AI tutoring, AI customer service as outcome, AI content as commodity.
The "product" is the outcome; the AI is operational. Competitors with similar AI but worse operations lose.
Pattern 6: Multi-modal moats
A product moats on multiple dimensions: vertical + data + integration + workflow. No single dimension is a moat, but combined they're hard to replicate.
This is what most successful AI products look like. Not one big moat; several reinforcing each other.
Pricing case studies
A few real (anonymized) cases:
Case 1: The "race to the bottom" wrapper
A startup launched in 2023 with a chat interface for content writing. Free tier; €20/month pro tier.
Year 1: revenue grew. Year 2: 5 competitors launched similar products. Pricing pressure: €10/month, then €5/month. Margins squeezed; growth stalled.
Year 3: ChatGPT included similar features in their consumer subscription. Competitive position untenable. Shut down.
Lesson: A wrapper without other moats has no defense.
Case 2: The vertical specialist
A company built specifically for AI in pharma research. Deep integration with research databases. Specialized prompts for medical terminology. Compliance for FDA workflows.
Pricing: €1,500/user/month. Margins healthy. Customer churn low. Direct competitors struggle to match the depth.
Lesson: Vertical depth + compliance + integration is a real moat.
Case 3: The embedded platform
An AI feature added to an existing project management platform. Users were already there for project management; AI made it more valuable.
Pricing: bundled into existing tiers. No separate AI pricing. Net effect: retention up, ARPU up.
Lesson: Existing distribution + AI as additive feature is robust.
Case 4: The data flywheel
A sales tool that learns each customer's accounts, voice, and patterns over time. After 12 months of use, the model knows their specific business well.
Pricing: per-seat with custom enterprise. High retention; switching is real cost (losing 12 months of personalization).
Lesson: Customer-specific data is a real moat.
What goes wrong
Common failure modes:
Failure 1: Margin compression. Started with healthy margins; competitors and price pressure squeezed them. Now revenue grows but profit doesn't.
Failure 2: Foundation model launch. OpenAI / Anthropic launched a feature competing with your product. Your differentiation evaporated.
Failure 3: Heavy user economics. A few users disproportionately drive cost. They're loss-makers. Either you charge them appropriately or eat the loss.
Failure 4: Quality regression. Foundation model update changed behavior. Your tuned prompts broke. Customer trust dropped. Recovery is slow.
Failure 5: Customer churn to free alternatives. ChatGPT or Claude could do most of what your product does, for cheaper. Users moved on.
Failure 6: Sales cycle expectations. Enterprise expects long-term value. Your product might be replaced by Microsoft Copilot in 18 months. Trust hard to build.
Failure 7: Scaling without margin discipline. Growth funded growth; margins ignored. Eventually money runs out, no path to profitability.
Failure 8: Foundational dependency risk. API price increase, deprecation, or outage from your provider. Your business is disrupted by something you don't control.
Specific pricing patterns that work
A few patterns from products that have made the economics work:
Per-action with caps. Charge per "completed task" or "successful outcome." Cap usage at the tier. Heavy users pay overage. Clear value alignment.
Bring your own API key. Sophisticated users can plug in their own OpenAI / Anthropic key, paying inference costs directly. You charge for the product around it. Removes inference cost from your books.
Enterprise custom pricing. Large customers negotiate. You can build cost into the price; smaller customers fit standard tiers.
Free tier with strong upgrade path. Generous free tier for adoption; clear value in paid tiers. Free tier users self-serve and become paying customers.
Annual contracts with discounts. Commits revenue; smooths usage prediction; reduces churn. Standard SaaS pattern, still works.
A framework for the pricing decision
Steps:
- Model your cost structure. What does it actually cost to serve a typical user, a heavy user, a light user?
- Choose your unit. What are you charging for? Seats, actions, outcomes, tokens?
- Find the right price. Where does customer value, your cost, and competitive pressure align?
- Tier intelligently. Free / pro / enterprise with clear value progression.
- Set limits. Where do heavy users start hurting margins? Enforce caps or charge overage.
- Plan for change. Foundation model prices will drop. Your price may need to follow. Or you can keep prices and grow margin. Decide which.
- Measure constantly. Customer lifetime value, contribution margin, churn, expansion. Adjust as you learn.
A note on AI startups in 2026
The honest landscape:
Wrapper businesses usually struggle. A simple wrapper around a foundation model rarely survives on its own. The exceptions have specific distribution, brand, or integration moats — the wrapper is the smallest part of what they actually sell.
Vertical specialists are thriving. Domain-specific AI products with deep customer knowledge are doing well.
Platform plays are dominating. Microsoft, Google, Salesforce, large vendors with distribution own much of the enterprise AI market through bundles.
AI-native operations work. Companies using AI to deliver services (not sell AI) often have better economics than AI-as-software companies.
Open-source frontier is reshaping. Capable open models change the competitive dynamics for some segments. Foundation-model lock-in eroding.
Foundation model providers are platforms. OpenAI, Anthropic, Google are building product layers on their models. Customers face the question: build on OpenAI or be replaced by OpenAI?
The successful new companies in 2026 navigate these dynamics intentionally. The ones that don't, mostly disappear.
What to optimize for
For founders, a hierarchy:
- Build something that does real work. Not "AI for X." A measurable outcome customers value.
- Establish moats deliberately. Customer data flywheel, vertical depth, integration, workflow embedding. Pick one or more; build them in.
- Get unit economics right. Margins that work at scale. Pricing that reflects value and protects cost.
- Don't depend on a moving foundation. Diversify model providers. Build to swap. Be prepared for provider changes.
- Scale operational excellence. Observability, evals, security, quality. The boring stuff that distinguishes lasting from disappearing.
- Defend the relationships. Trust, integrations, switching costs. Customers who can't easily leave are the moat.
This isn't different from successful traditional businesses — but the urgency is higher because the underlying tech is shifting faster.
The takeaway
Shipping an LLM product in 2026 is harder than it was in 2023. The wrapper era is largely over; foundation model providers and platforms compete directly. Margins are tighter than traditional SaaS; competition is broader.
The products that succeed have moats that aren't "AI." They're proprietary data, vertical depth, distribution, integration, trust, workflow specialization. Built deliberately, layered together, defended actively.
The economics need rigor. Variable costs that scale with usage. Margin protection through caching, routing, tiering, smart pricing. Heavy users either charged appropriately or capped.
The teams that get this right build real businesses on AI. The ones that don't, become the case studies — interesting demos, brief traction, then forgotten.
It's a harder game than the early hype suggested. The opportunity is real but specific. Build with moat in mind from day one. Manage margin like it matters. Treat foundation models as commodity infrastructure, not differentiation.
That's how you ship an LLM product that's still here in three years.