# Secure RAG Document Ingestion Checklist

Use this before publishing documents into a retrieval-augmented generation system.

## Source Registration

- [ ] Source owner named.
- [ ] Data owner named.
- [ ] System of record documented.
- [ ] Allowed users, roles, or groups documented.
- [ ] Sensitivity level assigned.
- [ ] Update frequency documented.
- [ ] Retention and deletion behavior documented.
- [ ] Review cadence assigned.

## File Safety

- [ ] File type allowlist enforced.
- [ ] File size limit enforced.
- [ ] Malware scanning or content-disarm requirement decided.
- [ ] Encrypted or unsupported files rejected or routed to review.
- [ ] Original file hash stored.
- [ ] Uploader or source event recorded.

## Extraction And OCR

- [ ] Extraction method recorded.
- [ ] OCR confidence recorded where relevant.
- [ ] Page count and extracted character count recorded.
- [ ] Empty or failed pages flagged.
- [ ] Tables reviewed when exact values matter.
- [ ] Source language detected.
- [ ] Low-quality extraction excluded or routed to review.

## Metadata And Permissions

- [ ] Every chunk has source ID and document ID.
- [ ] Every chunk has tenant/user/role visibility metadata.
- [ ] Every chunk has source owner and sensitivity metadata.
- [ ] Every chunk has version or content hash.
- [ ] Every chunk has page or section reference where possible.
- [ ] Retrieval filters permissions before ranking.
- [ ] Unauthorized users retrieve zero restricted chunks in tests.

## Retention And Deletion

- [ ] Original file cache deletion path exists.
- [ ] Extracted text deletion path exists.
- [ ] Chunk deletion path exists.
- [ ] Embedding/vector deletion path exists.
- [ ] Summary/cache deletion path exists.
- [ ] Logs and backups follow documented retention rules.
- [ ] Deletion verification is recorded.

## Launch Gate

- [ ] Sample questions retrieve expected sources.
- [ ] Stale sources are excluded or flagged.
- [ ] Citations point to valid source locations.
- [ ] Prompt-injection text inside documents is treated as content, not instruction.
- [ ] Source owner approved production indexing.
