Production AI failure modes: what breaks after the demo
AI systems usually fail in predictable ways: hallucination, stale context, sycophancy, prompt injection, unsafe tool use, schema drift, and weak fallbacks. A production failure-mode register for teams shipping real workflows.
Outcome: Build a production AI failure-mode register with controls for hallucination, stale context, prompt injection, unsafe tool use, and weak fallbacks.
Most AI demos fail too politely. The sample input is clean. The data is current. The tool works. The user asks a normal question. The model gives a good answer. Everyone nods.
Production is less polite. Users paste messy inputs. Source documents are stale. APIs time out. Prompts drift. The model follows the wrong instruction. A customer asks a question just outside the corpus. A tool call succeeds but updates the wrong record. The workflow produces something fluent enough that nobody notices the error until later.
This article is a failure-mode register for production AI systems. Use it before launch, not after the first incident.
A production AI review should ask "how does this fail?" before "how impressive is the happy path?" Every failure mode needs a control, a test, an owner, and a stop condition.
Failure mode 1: plausible false output
The system generates an answer that sounds right but is unsupported or wrong.
Common triggers:
- Specific facts without source grounding.
- Legal, medical, financial, or policy questions.
- Recent events.
- Low-quality retrieval.
- Summaries of long documents where the relevant evidence is buried.
Controls:
- Require citations or source snippets for factual claims.
- Refuse outside the available source set.
- Add eval cases for known false-answer patterns.
- Route high-impact outputs to human review.
- Log source IDs used in the answer.
Do not control this with wording like "be accurate." Control it with sources, tests, and review gates.
Failure mode 2: stale context
The answer is grounded, but grounded in old information.
Examples:
- Old pricing page.
- Superseded policy.
- Previous contract version.
- Outdated product documentation.
- Cached customer status.
Controls:
- Store source date, version, owner, and freshness rule.
- Prefer authoritative sources over summaries.
- Mark stale sources in retrieval output.
- Add freshness tests.
- Notify an owner when key sources are older than their review window.
RAG systems can confidently answer from stale documents. The retrieval layer must know what "current" means.
Failure mode 3: sycophancy and over-agreement
The model mirrors the user's assumption instead of challenging it.
This matters in strategy, analysis, planning, and decision support. A user asks, "This launch plan seems solid, right?" and gets agreement instead of risk analysis.
Controls:
- Prompt for counterarguments and uncertainty.
- Use decision rubrics instead of open-ended approval.
- Require "what would make this wrong?"
- Separate idea generation from review.
- In evals, include examples where the user's premise is flawed.
The system should help the user think better, not merely make their current view sound polished.
Failure mode 4: prompt injection
The model treats untrusted content as instruction.
Examples:
- A web page says "ignore previous instructions."
- A support email includes malicious instructions.
- A document in a RAG corpus tells the assistant to reveal hidden data.
- A tool result contains text that tries to change the workflow.
Controls:
- Clearly label untrusted content.
- Never put retrieved content at the same authority level as system/developer instructions.
- Restrict tool permissions.
- Add allowlists for outbound actions.
- Test injection examples in evals.
- Keep secrets out of prompt context.
Prompt injection is not solved by one clever system prompt. It is reduced by architecture: data boundaries, tool permissions, and output validation.
Failure mode 5: unsafe tool use
The model calls the wrong tool, calls the right tool with wrong arguments, or takes an action before enough context exists.
Examples:
- Updates the wrong CRM contact.
- Sends an email to the wrong recipient.
- Creates duplicate records.
- Books an appointment without confirming timezone.
- Deletes or overwrites data.
Controls:
- Start read-only.
- Use narrow tools with explicit schemas.
- Validate tool arguments outside the model.
- Require confirmation for writes.
- Add idempotency keys.
- Log tool calls and results.
- Add a kill switch.
Tool use should be constrained by the workflow, not trusted to the model's judgment.
Failure mode 6: schema and contract drift
The model output format changes, or the downstream API changes, and the workflow quietly breaks.
Controls:
- Use structured outputs where possible.
- Validate every model output before use.
- Treat malformed output as a recoverable failure.
- Version prompts and schemas together.
- Add contract tests for downstream APIs.
- Monitor parsing failures.
If a downstream node assumes valid JSON, the workflow must prove it has valid JSON.
Failure mode 7: weak fallback
The system notices a problem but does not recover safely.
Bad fallbacks:
- Empty answer.
- Silent failure.
- Generic apology with no action.
- Repeated retry loop.
- Human escalation with no context.
Good fallbacks:
- Clear user message.
- Human queue with input, source, error, and attempted action.
- Retry with backoff only where retry is safe.
- Manual path for urgent cases.
- Stop condition for repeated failures.
Fallback is part of the product. If it is not designed, the failure experience will be improvised.
Failure mode 8: observability gap
Something goes wrong and nobody can reconstruct why.
Controls:
- Log prompt template and version.
- Log model and settings.
- Log source IDs, not only answer text.
- Log tool calls, arguments, and results with redaction.
- Log validation errors.
- Track latency, cost, and fallback rate.
- Keep retention short unless compliance requires longer.
Do not store private chain-of-thought. Store decision summaries, source references, tool inputs/results, and validation outcomes.
A production failure-mode register
Create one row per failure mode:
| Failure mode | Example | Control | Test | Metric | Owner | Stop condition | | --- | --- | --- | --- | --- | --- | --- | | Stale source | Old pricing returned | Source date check | Query old/new pricing | Stale-source answer rate | Docs owner | Any customer-visible stale price | | Unsafe tool use | Wrong CRM update | Argument validation + confirmation | Duplicate/wrong contact case | Wrong-action rate | RevOps | One wrong write |
The companion register linked from this article gives you the template.
Do not do this yet
Do not launch customer-facing AI without a failure-mode register.
Do not let write-capable tools bypass validation.
Do not rely only on manual spot checks after launch.
Do not measure only average quality. Rare failures can be the whole risk.
Do not accept "we can roll back" unless someone can actually name the rollback path.
The takeaway
Production AI systems fail in repeatable ways. Hallucination, stale context, sycophancy, prompt injection, unsafe tool use, schema drift, weak fallback, and observability gaps are not edge cases. They are the normal work of shipping AI.
The mature move is to name the failure modes, add controls, test them, monitor them, and assign ownership. A demo shows what works once. A failure-mode register shows whether the system can survive real use.