Building Long-Term Memory for an AI Agent

How we made an agent remember everything—from who you are to what you prefer.

The Cold Start Problem

Every conversation shouldn't start from scratch.

First conversation: "Hi, I'm Ronit. I'm building Dimension, an AI productivity tool. I work with Tejas, Clarence, and Vyapak on the backend."

Second conversation (week later): "Hi again, Ronit. Who are Tejas, Clarence, and Vyapak? Are they your team? What does Dimension do?"

That's useless.

A good agent remembers the user. Remembers their team. Remembers what they've built. Remembers what they've asked before. Remembers how they like information formatted.

We built this with a knowledge graph powered by Zep, backed by Neo4j, and fed by 27 different ingestion points.

What the Memory System Knows

The graph captures:

Entities:

People (you, your team, your contacts)
Channels (Slack, Discord, communication groups)
Repositories (GitHub projects you own or contribute to)
Projects (things you're building)
Workspaces (your teams and orgs)

Relationships:

member_of (who's in which team)
works_on (who works on which project)
communicates_with (who talks to whom)
belongs_to (what belongs to what)

Facts:

User preferences ("I prefer bullet points")
Decisions made ("We decided to use TypeScript")
Communication style ("Sarah responds to emails in under 2 hours")

We control what goes in. Zep handles the hard parts: entity extraction, relationship inference, deduplication, contradiction resolution.

27 Ingestion Points

The graph is fed from everywhere:

From Conversations (3 sources)

Every human message
AI responses (async via NATS)
Workflow trigger data (if >2,500 chars)

From Scheduled Intelligence (3 sources)

Daily morning briefing summary
Daily evening briefing recap
Action items surfaced throughout the day

From Integrations (multiple sources)

Slack — workspace structure, channels, users, memberships
Gmail — emails, VIP contacts with tone/response patterns
GitHub — repos, teams, access control, language
Linear — org, teams, projects, issues, memberships
Airtable — base schemas, table structures, your data model
Vercel — project metadata, frameworks, deployment regions

From Explicit User Input (3 sources)

Direct memory commands ("Remember that I prefer bullet points")
Onboarding research (web search to build your profile)
Skill learning summaries (when you validate a learned workflow)

How Memories Become Facts

Every source feeds the same pipeline:

Email from John → EpisodeData
Morning briefing → EpisodeData
Slack workspace → EpisodeData
Linear ticket → EpisodeData
    ↓
    All go to Zep Graph Engine
    ↓
1. Entity Extraction
   "John is the CFO" → Entity: John (label: PERSON)
   
2. Relationship Inference
   "John responded to my email" → Relationship: John —[communicates_with]→ User
   
3. Fact Extraction
   "He usually responds within 2 hours" → Fact: John has property response_time = "2 hours"
   
4. Deduplication & Contradiction Resolution
   Previous fact: "John responds in 3 hours"
   New fact: "John responds in 2 hours"
   → Older fact invalidated, new one wins
   
5. Graph Update
   Stored in Neo4j with temporal metadata

The clever part is temporal resolution. Every fact has four timestamps:

t_valid — When it became true in the real world
t_invalid — When it stopped being true
t_created — When we learned about it
t_expired — When we replaced it

This means the graph can reconstruct what the agent knew at any point in time. And it handles contradictions gracefully: old facts aren't deleted, they're marked invalid.

The Memory Search Experience

When the agent needs to remember something, it calls MEMORY_SEARCH with 1–3 queries.

Example: Agent realizes it needs to draft an email to John.

MEMORY_SEARCH(
  ["John", "CFO", "communication style"]
)
    ↓
Stage 1: Multi-Method Parallel Search
├─ Cosine Similarity (find similar embeddings)
├─ BM25 Full-Text (find keyword matches)
└─ BFS Traversal (find graph neighbors)

    ↓
Stage 2: Reranking
├─ Cross-encoder scores results (0–1)
└─ Threshold 0.3 filters noise

    ↓
Stage 3: Output
Top 10 edges + Top 6 nodes → memory_context

Returns:
├─ Entity: John (CFO, responds in ~2 hours, prefers bullet points)
├─ Relationship: John —[communicates_with]→ User (weekly syncs, mostly email)
├─ Fact: "Mentioned response time concerns in last email"
└─ Context: Last 3 emails with John

The agent now has everything it needs to draft a professional email matching John's preferences.

Real Examples of Memory in Action

"Email John" Without memory: Which John? No context. Generic email. With memory: Agent knows John is the CFO, prefers bullet points, responds in 2 hours. Drafts professional, concise email.

"Update the PRD" Without memory: Which PRD? No context. With memory: Agent knows you have a Q2 PRD in Notion, Q1 in GitHub, and Q3 in Google Docs. Asks which one.

"The usual format" Without memory: What format? Confused. With memory: Agent remembers you always ask for reports in bullet-point format with an executive summary at the top. Formats accordingly.

"What did we decide last week?" Without memory: Can't answer. With memory: Agent searches the graph for decisions from last week, finds them, and summarizes.

Cache Warming: Making Memory Fast

A knowledge graph with 27 ingestion points gets big. Searching it from cold (disk) takes ~3 seconds.

So when a user opens the app, we preload the graph into memory with zep.user.warm(). Fire-and-forget during session init.

Search latency drops to ~400ms warm.

Building Onboarding Into Memory

When a new user signs up, a dedicated background agent runs web search on them:

LinkedIn profile
Company website
Public profiles
GitHub activity

The results are chunked into 1,500-char segments and ingested into the graph.

On conversation one, the agent already knows your professional background. No "tell me about yourself." The agent did homework.

Learned Skills as Memory

Sometimes the agent discovers a pattern in how you work. "Ronit always asks for his weekly standup from Linear, formatted as bullets, posted to #eng-standup."

The agent might suggest: "Should I do this every Monday morning?"

If you say yes, the agent ingests the execution summary into the graph. The memory system now knows this is a learned preference, backed by your explicit approval.

This is different from inferring preferences. This is you validating that the agent understands a workflow pattern.

The Graph in Production

Here's what's actually happening behind the scenes:

Every morning briefing you receive → ingested Every email you send → ingested (after the run) Every workflow the agent suggests → ingested Every integration you connect → workspace metadata ingested Every action item surfaced → ingested with full context

The graph becomes a temporal map of your work life. Not a transcript. A semantic graph of what matters.

When you ask "What happened on April 15?" the agent can search for that date, find the briefings and action items, and reconstruct the day.

When you ask "What have I been working on?" the agent searches for projects and goals mentioned in conversations and ingested data, and gives you a summary.

When you ask "Who's on my team?" the agent finds team members from Slack, GitHub, and email ingestion.

Why This Matters

Most AI systems have conversation history. Some index it. Few actually remember across conversations in a way that changes how the system behaves.

We built something that:

Learns from every interaction — 27 ingestion points mean context comes from everywhere
Handles contradictions — Old facts are invalidated, not deleted
Searches efficiently — Hybrid search (semantic + keyword + graph traversal) finds what you need
Works across time — Temporal facts mean the graph knows what's true today vs. what was true last week
Respects privacy — One graph per user, per-user ACL on integrations

The result: An agent that remembers who you are, who you work with, what you're building, and how you prefer to work.

That's not a chatbot. That's a colleague.

This powers Dimension's memory system—27 ingestion points feeding a temporal knowledge graph, searched at 400ms latency across thousands of users daily.