Building memory for long-running agents
Agents need memory beyond the context window. Long-term memory architecture — what to store, when to retrieve, how to forget — determines whether agents feel like they 'know' you or start fresh every conversation. The patterns and the production trade-offs.
An AI agent that doesn't remember you is fundamentally limited. Every conversation starts from scratch. You have to re-introduce yourself, re-explain your preferences, re-clarify ongoing work. The friction adds up; trust drops.
This is the memory problem. Context windows handle the current conversation. Long-term memory — across sessions, across days, across months — needs its own architecture. And it's harder than it looks.
This article covers what real agent memory looks like in production. The layers, the storage choices, the retrieval patterns, and the trade-offs that distinguish memory that helps from memory that hallucinates.
What "memory" means
A simplistic view: memory = "the model remembers things between conversations." Reality is more nuanced. Cognitive science distinguishes types of memory, and AI agent memory benefits from a similar distinction:
Working memory. The current conversation. Held in the context window. Lost when the conversation ends (unless persisted).
Episodic memory. Specific past events. "Last Tuesday, we discussed X." "Three months ago, you decided Y."
Semantic memory. General facts. "Your name is Alice." "You prefer concise responses." "Your company is in Tallinn."
Procedural memory. How to do things. "When the user asks for a meeting, use this template." "When the customer is in tier X, follow process Y."
Different memory types serve different functions. A complete agent memory system covers all of them.
What memory should accomplish
Before architecture, the goals:
Continuity. The agent picks up where it left off. No re-introducing yourself every session.
Personalization. The agent applies your preferences without being asked. Writes in your voice, uses your tools, references your team.
Context preservation. Decisions from past conversations inform current ones. "We decided X last month" should be remembered.
Skill accumulation. The agent learns your patterns and applies them. After 10 conversations about coding in Python, it should default to Python.
Privacy and forgetting. What's remembered, what's not, what gets deleted. Both for user trust and legal compliance.
These goals sometimes conflict. Continuity wants to remember everything; privacy wants to remember nothing. The architecture navigates the trade-offs.
The architecture
A typical layered architecture:
┌─────────────────────────────────────┐
│ Working memory (in-context) │ Current conversation
├─────────────────────────────────────┤
│ Session memory (recent) │ Last N conversations
├─────────────────────────────────────┤
│ Episodic memory (long-term) │ Specific past events
├─────────────────────────────────────┤
│ Semantic memory (facts) │ Stable user facts
├─────────────────────────────────────┤
│ Procedural memory (preferences) │ How to behave for this user
└─────────────────────────────────────┘Each layer has its own storage, retrieval, and decay logic.
We'll go through each.
Layer 1: Working memory
Already covered in the context engineering article. The current conversation in context. For multi-turn conversations, tiered context with recent turns verbatim and older turns summarized.
The hand-off to long-term memory happens at session end. The conversation's key information is extracted and stored.
Layer 2: Session memory
Recent sessions — say, the last 10 conversations — kept in light detail. Available to the agent on the next session.
Implementation: a summary per session, stored with timestamp and topic. When the user returns, the agent has a quick reference for what's been happening recently.
{
"session_id": "abc-123",
"user_id": "alice",
"started": "2026-05-14T10:30:00Z",
"ended": "2026-05-14T10:45:00Z",
"topic": "Drafting proposal for Acme Corp",
"summary": "Drafted v1 of the Acme proposal. Decided to lead with the cost-savings angle. Alice will review and send Friday.",
"facts_learned": ["Acme is a current customer", "Alice's deadline is Friday"],
"open_items": ["Alice to review v1 by Thursday"]
}On a new session, the most recent 3-5 session summaries can be auto-loaded. The agent has context for what's been happening.
This is the most accessible form of cross-session memory. Easy to implement, immediately useful.
Layer 3: Episodic memory
Specific past events worth remembering longer-term. Decisions, milestones, important conversations.
These are extracted from sessions when they're notable. Stored with rich metadata.
{
"event_id": "ev-456",
"user_id": "alice",
"date": "2026-04-22",
"type": "decision",
"description": "Alice decided to migrate from Postgres to ClickHouse for the analytics workload, citing query performance.",
"context_summary": "After 3 weeks of evaluation including performance tests and cost analysis.",
"related_topics": ["infrastructure", "analytics", "database"],
"importance": "high"
}Retrieval: when relevant to the current conversation, the agent fetches related episodes. Via semantic search (embed the current query, find matching episodes), via topic matching, or via temporal queries ("what happened last month?").
The challenge: deciding what qualifies as an "episode worth remembering." Not every conversation is. A pattern: at session end, an LLM extracts notable events from the conversation. Decisions, commitments, milestones get stored; small talk doesn't.
Layer 4: Semantic memory
Stable facts about the user that should always be available. "Alice is the CEO of Acme. Her preferred communication style is concise. She works in Tallinn timezone."
These are smaller in volume than episodes but more frequently retrieved. They form the agent's "model of the user."
Implementation: a structured profile.
{
"user_id": "alice",
"profile": {
"name": "Alice Tamm",
"role": "CEO at Acme Corp",
"location": "Tallinn, Estonia",
"timezone": "Europe/Tallinn",
"preferred_language": "English",
"communication_style": "concise, direct, no preamble",
"expertise_areas": ["product strategy", "go-to-market"],
"tools_used": ["Notion", "Slack", "Linear"]
}
}Updates happen when the agent learns new facts. After a session, an LLM identifies new stable facts and proposes them; either auto-merged or queued for review.
Importantly: semantic facts should be confident and stable. An off-hand comment in one conversation ("I might try Python") shouldn't become a semantic fact ("Alice prefers Python"). The bar is higher.
A confidence-based approach:
- Heard once: candidate fact, not yet stored.
- Heard twice or explicit: stored with medium confidence.
- Explicitly confirmed or referenced frequently: stored with high confidence.
This prevents the agent from "learning" wrong facts from offhand comments.
Layer 5: Procedural memory
How the agent should behave for this user. Workflows, templates, preferences for specific actions.
Examples:
{
"user_id": "alice",
"procedural": {
"email_signature": "...",
"meeting_preferences": "always offer 3 time slots, never schedule before 9am",
"code_style": "Python, type hints required, dataclasses over dicts",
"tone_for_clients": "warm, direct, with explicit next steps",
"approval_process": "all customer-facing communications need Alice's review before sending"
}
}These are patterns the agent follows when relevant tasks come up.
Updates happen explicitly ("Alice, please always do X this way") or through pattern recognition (after 5 similar requests handled the same way, the pattern is added).
Storage choices
Where does memory live?
SQL database. Reliable, queryable, well-understood. Each memory type a table. Joins for retrieval. Good for structured access patterns.
Vector database. For semantic retrieval of episodes ("find memories related to this topic"). Episodes are embedded; retrieval by similarity.
Combination. Often the best: SQL for structured queries, vector for semantic. Memory items live in both, with consistent IDs.
Specialized memory tools. Mem0, Letta (formerly MemGPT), Zep. These are purpose-built memory layers for agents. Worth considering if you want a higher-level abstraction.
For most teams: a simple SQL + vector approach is fine. Specialized tools are useful but add a dependency.
Retrieval patterns
How does the agent get memory into context?
Pattern 1: Auto-load on session start
When a new session begins, automatically pull:
- The user's semantic profile.
- The most recent N session summaries.
- Any open commitments or follow-ups.
This is the baseline context the agent has when the user shows up.
Pattern 2: Query-driven retrieval
When the user's message hints at past topics, retrieve relevant episodes.
Example: user asks "what was the conclusion of our database discussion?" The agent searches episodes for "database" and retrieves the relevant one.
Implementation: embed the user's message, find similar episodes, include them in context.
Pattern 3: Explicit memory tools
The agent has tools to query memory:
search_episodes(query): find specific past events.get_user_profile(): pull the semantic profile.list_open_items(): pending commitments.
The agent decides when to call these based on the conversation.
Pattern 4: Background memory enrichment
A background process periodically reviews memory and:
- Consolidates related episodes into themes.
- Updates confidence on facts.
- Decays old memories that haven't been accessed.
This is "memory maintenance" — keeping the memory store useful over time.
Writing memory
When does memory get written?
End-of-session extraction
The reliable pattern. When a session ends:
- An LLM analyzes the conversation.
- Extracts: - Session summary. - Notable events (for episodic memory). - New facts (for semantic memory). - Preference signals (for procedural memory).
- Updates and stores.
This batch processing keeps the in-session experience fast (no memory writes during conversation).
Prompt for extraction:
Analyze this conversation. Output JSON with:
1. summary: 2-3 sentence summary of what happened.
2. notable_events: array of significant events worth remembering (decisions made, milestones, important context).
3. new_facts: array of stable facts learned about the user (only include if you have high confidence).
4. preference_signals: array of preferences observed (only if expressed clearly or repeated).
5. open_items: array of unresolved items the user might want to revisit.
Be conservative. Only include items with high confidence. Better to miss something than to hallucinate.Real-time updates for high-value facts
For some facts, waiting until session end is wrong. If a user says "actually, my name is Alex, not Alice" — the correction should be applied immediately.
A pattern: have the agent detect explicit corrections or important new facts in real-time, and update memory inline.
This needs careful design — the LLM might "learn" wrong facts. Some teams require user confirmation before applying real-time updates.
User-initiated updates
The user can explicitly tell the agent things to remember:
- "Please remember that I prefer X."
- "Forget what I said about Y."
- "Always do Z."
These should be first-class. Honor them immediately. They're the highest-confidence signals.
A specific tool the agent can offer:
remember(content: string, type: "fact" | "preference" | "procedure")
forget(content: string)
list_what_you_remember()Giving users this control builds trust.
Forgetting and decay
Memory that grows forever becomes noise. Decay is essential.
Time-based decay
Older memories are less likely to be retrieved. Implementation:
- Score retrieval by
relevance * recency_decay. - Old memories effectively disappear unless explicitly referenced.
Importance-based retention
Important episodes are retained longer; trivial ones decay faster.
- Tag episodes with importance at write time.
- Critical events: indefinite retention.
- Routine events: decay over months.
User-initiated forgetting
The user can request specific memories be deleted.
- Specific facts.
- Specific time periods.
- Specific topics.
Implementation: a delete operation that removes (or marks as deleted) the relevant items.
Compliance-driven deletion
Legal requirements (GDPR right to be forgotten, data retention laws) sometimes mandate deletion.
- User account deletion → all memories deleted.
- Per-request data deletion → specific memories deleted.
- Retention limits → automatic deletion after N months.
These must be built in from the start. Retrofitting is painful.
Privacy considerations
Memory is sensitive. The agent knows a lot about the user. Considerations:
Encryption at rest
Memory data encrypted. Standard practice.
Access controls
Who can see a user's memory? Just them, just the system, support staff under certain conditions? Define clearly. Audit access.
PII handling
Personal identifiable information (real names, addresses, financial info) should be tagged and treated carefully. Special access controls, special deletion procedures.
User visibility
Users should be able to see what the agent remembers about them. This is both ethically right and good UX. Provide a "memory dashboard."
What the agent remembers about you:
Profile:
- Name: Alice Tamm
- Role: CEO at Acme Corp
- Communication style: concise, direct
Recent sessions:
- 2026-05-14: Drafted proposal for Acme
- 2026-05-12: Reviewed Q1 results
- ...
Preferences:
- Prefers concise responses
- Uses Notion, Slack, Linear
[Edit] [Delete specific items] [Delete all]This transparency builds trust. Hidden memory is creepy.
Sharing across contexts
If the user has multiple "modes" (work agent, personal agent), they may want memories separated. Don't auto-share across modes unless asked.
Common failure modes
A few patterns:
Failure 1: Hallucinated memories
The agent claims to remember things that didn't happen. "Last week we agreed on X" — but X was never discussed.
Cause: LLM "filling in" plausible-sounding memories during extraction or retrieval.
Fix: ground memory operations on real conversation data. The LLM extracts; verification is against the actual transcript. Hallucinated facts should be flagged.
Failure 2: Wrong facts learned
The agent confidently states wrong facts. "You said you prefer Python" when you actually said you were forced to use Python at work.
Cause: misinterpretation during extraction.
Fix: confidence thresholds. Only learn from explicit, repeated, or confirmed statements. User can correct.
Failure 3: Privacy leaks
Memory from one user surfaces in another's conversation. Catastrophic.
Cause: bugs in user-scoping logic.
Fix: enforce user scoping at the storage and retrieval layers. Audit. Never trust the LLM to filter.
Failure 4: Memory bloat
After a year, memory is megabytes per user. Retrieval slows. Costs grow.
Cause: no decay or pruning.
Fix: aggressive decay. Most memory becomes inaccessible (low retrieval priority) after months. Compaction periodically.
Failure 5: Stale facts
The user changed roles 6 months ago. The agent still references the old role.
Cause: facts not updated when superseded.
Fix: detect contradictions. When a new fact conflicts with an old one, the new fact wins (with confirmation if uncertain).
Failure 6: Disorienting consolidation
Background memory consolidation occasionally rewrites memories in ways that lose information.
Cause: aggressive summarization without preserving key facts.
Fix: consolidation must preserve facts explicitly. Test consolidation on real memory transcripts.
A worked example: personal assistant with memory
A real-world case: a personal AI assistant for individual users.
Memory layers:
- Working: current conversation.
- Session: last 7 sessions in summary form.
- Episodic: 100 most recent notable events, with semantic search.
- Semantic: user profile (name, role, preferences, tools).
- Procedural: explicit workflows the user has set up.
Storage:
- SQL (Postgres): structured profile, sessions, episodes, procedures.
- Vector DB (pgvector): episodic semantic search.
Operations:
- Session start: auto-load semantic profile + last 3 sessions + open items.
- Mid-session: episodic retrieval triggered by topic relevance.
- Session end: LLM-based extraction; user can review what was learned.
- Background: weekly consolidation (combine related episodes, decay stale ones).
User controls:
- Memory dashboard showing what's remembered.
- Edit/delete individual items.
- "Forget the last hour" button.
- Full account deletion (wipes everything).
Outcomes:
- Continuity: users report the agent "feels persistent" across sessions.
- Personalization: response style matches user preferences without re-prompting.
- Privacy: explicit controls give users confidence.
- Cost: memory is ~5-15% of per-session token usage. Worth it.
Failure modes addressed:
- Hallucinated memories caught by verification during extraction.
- Wrong facts caught by confidence thresholds.
- Privacy enforced at every storage and retrieval point.
- Bloat managed by decay and consolidation.
This is a production-grade memory system. Not trivial; well within reach for a focused team.
Specialized tools
A note on memory-as-a-service options:
Mem0. Open-source, well-designed. Handles many of the patterns above. Worth considering if you don't want to build from scratch.
Letta (MemGPT). Different paradigm — the LLM itself manages memory through tool calls. Powerful but more complex.
Zep. Hosted memory layer. Easy integration.
Cognee. Newer; knowledge-graph-based memory.
These tools save building time. They also add a dependency and constrain customization. For mature production systems, building memory often makes sense; for prototypes or smaller teams, using a tool is reasonable.
The takeaway
Long-term memory is what makes agents feel intelligent and continuous rather than amnesiac. It's also one of the harder things to get right.
The architecture is layered:
- Working memory (in-context).
- Session memory (recent sessions).
- Episodic memory (specific events).
- Semantic memory (stable facts).
- Procedural memory (preferences and workflows).
Each layer has its own storage, retrieval, and decay logic. Each contributes to making the agent useful across time.
The patterns that matter:
- Conservative extraction (don't hallucinate facts).
- Confidence-based learning (don't learn from offhand comments).
- Active forgetting (decay and pruning).
- User control (transparency and edit ability).
- Privacy enforcement (at every layer).
Done well, memory turns AI from "a fresh stranger every conversation" into "a continuous, useful partner." That's the difference between AI as a tool and AI as a colleague.
For agents you intend to live with — across days, weeks, months — memory isn't optional. It's foundational. Build it deliberately, with the layers and discipline it needs.