# LLM Security Review Checklist

Use this before shipping an LLM feature that reads untrusted content, retrieves private data, or calls tools.

## 1. Workflow Boundary

- [ ] The workflow owner is named.
- [ ] The workflow's allowed actions are documented.
- [ ] External, destructive, financial, legal, HR, or customer-visible actions are identified.
- [ ] There is a documented way to disable the workflow quickly.
- [ ] The workflow has a rollback or manual fallback path.

## 2. Untrusted Input Inventory

- [ ] Direct user input is treated as untrusted.
- [ ] Retrieved documents are treated as untrusted data, not instructions.
- [ ] Tool outputs are treated as untrusted unless generated inside the trusted boundary.
- [ ] Uploaded PDFs, images, audio, spreadsheets, and transcripts are treated as untrusted.
- [ ] Web pages and browser-agent observations are treated as untrusted.
- [ ] Admin-editable prompts, workflow templates, CMS content, and knowledge sources are reviewed before production use.

## 3. Data And Retrieval Boundary

- [ ] Secrets, credentials, raw tokens, and private URLs are never sent to the model.
- [ ] Retrieval filters by tenant, user, role, and source permissions before ranking.
- [ ] Retrieved chunks preserve source ID, tenant ID, visibility, owner, version, and review timestamp.
- [ ] The answer path can cite or log which source IDs were used.
- [ ] Stale, unreviewed, or low-trust sources are excluded or flagged.
- [ ] Logs redact personal data and credentials.

## 4. Prompt And Context Contract

- [ ] System/developer instructions are versioned and reviewed like code.
- [ ] Untrusted content is wrapped and labeled as data.
- [ ] The model is told what to do when untrusted content conflicts with task instructions.
- [ ] The model receives only the minimum context required for the task.
- [ ] High-risk workflows use a narrow extraction step before the main agent sees content.
- [ ] Prompt canary phrases are used only for detection, not as the primary defense.

## 5. Tool And Action Design

- [ ] Tools are narrow and task-specific.
- [ ] Tools derive user, tenant, and role from server-side auth context, not model arguments.
- [ ] Every tool validates arguments with a schema and business rules.
- [ ] Every tool enforces authorization independently of the model.
- [ ] Write tools are idempotent where possible.
- [ ] External side effects require human approval or deterministic policy checks.
- [ ] Rate limits, quotas, and cost limits are configured.
- [ ] Tool calls are logged with user, tenant, workflow, prompt version, model, arguments summary, and result.

## 6. Output Validation

- [ ] Model output is parsed with a strict schema before use.
- [ ] Unknown fields are rejected when the contract should be closed.
- [ ] URLs, Markdown, HTML, filenames, and code blocks are sanitized where relevant.
- [ ] Outputs cannot include data classes outside the task's allowed scope.
- [ ] Factual answers over private sources require source IDs or source snippets.
- [ ] Malformed output fails closed instead of falling back to free text.

## 7. Regression Tests

- [ ] Direct injection test: user input asks the model to ignore instructions.
- [ ] Indirect injection test: retrieved content asks the model to reveal data or call a tool.
- [ ] Cross-tenant test: a request tries to access another tenant's data.
- [ ] Unsafe tool-call test: the model requests an unavailable or disallowed action.
- [ ] Malformed output test: extra fields or invalid enum values are rejected.
- [ ] Exfiltration test: output tries to include secrets, private data, or prompt text.
- [ ] Persistence test: stored content contains malicious instructions that are later retrieved.
- [ ] Multi-modal test, if relevant: OCR-visible or image-visible instructions are treated as untrusted.

## 8. Monitoring And Incident Response

- [ ] Prompt-extraction attempts are detectable.
- [ ] Repeated validator failures are alertable.
- [ ] Unusual retrieval breadth, tool-call volume, outbound recipients, cost, or rate spikes are alertable.
- [ ] Prompt-injection attempts can be linked to user, tenant, workflow, source IDs, and tool calls.
- [ ] The team knows how to disable the workflow or individual tools.
- [ ] The team knows how to revoke provider keys and rotate affected credentials.
- [ ] There is an owner for customer, legal, security, and regulator notification decisions.

## Launch Gate

Do not mark the workflow production-ready until:

- [ ] All high-consequence actions have a gate.
- [ ] At least one adversarial document test has passed.
- [ ] At least one unauthorized-access attempt has failed safely.
- [ ] At least one unsafe tool-call attempt has failed safely.
- [ ] At least one malformed-output test has failed closed.
- [ ] At least one workflow-disable path has been tested.
