Context Engine: Memory Architecture

April 20, 2026

Persistent memory across conversations powered by Zep's graph memory engine. 27 ingestion points feed a temporal knowledge graph with bi-temporal fact tracking, entity resolution, and community detection.

  • Ingestion Points: 27 sources
  • Graph Engine: Zep · Graphiti · Neo4j
  • Search Latency: ~400ms warm

01 — What the Memory Graph Does

The agent doesn't start every conversation from scratch. It remembers who the user works with, what channels they're in, which repos they own, what they asked last week, and what they prefer. This context lives in a knowledge graph (Zep) that accumulates over time.

Zep Graph Memory

Entities:

  • People
  • Channels
  • Repositories
  • Projects
  • Workspaces

Relationships:

  • member_of
  • works_on
  • communicates_with
  • belongs_to

Facts:

  • User preferences
  • Decisions made
  • Communication style

We control what goes in — the episodes, their shape, their timing. Zep handles the graph construction: entity extraction, relationship inference, deduplication, and contradiction resolution. We then search the graph at agent runtime via the MEMORY_SEARCH tool.

The graph is per-user. Every episode is tagged with a user_id. The agent for user A never sees user B's memory. 27 ingestion points feed this graph continuously.


02 — Ingestion Architecture

Every ingestion point boils down to one call: zep.graph.add() (single episode) or zep.graph.add_batch() (batch of up to 10). The EpisodeData shape is always the same. What varies is what we put in and when.

Four Source Categories

Conversation Layer:

  • All human messages
  • AI responses (post-processing)
  • Workflow trigger data

Scheduled Intelligence:

  • Morning briefing summary
  • Evening briefing recap
  • Action items with full context

Integration Sync:

  • Slack (workspace, channels, users)
  • Gmail (emails, VIP contacts)
  • GitHub, Linear, Airtable, Vercel

User Initiated:

  • Explicit memory (ADD_TO_MEMORY)
  • Onboarding user research
  • Skill learning summaries
  All sources
      
  EpisodeData
  ├── data: JSON or text
  ├── type: "json" | "message" | "text"
  └── source_description: context
      
  Zep Graph Engine
  1. Entity extraction
  2. Relationship inference
  3. Fact extraction
  4. Dedup & contradiction resolution
  5. Graph update

03 — Conversation Memory

Every meaningful exchange between the user and the agent feeds the graph. Three types of conversational data get ingested.

01 — Human Messages — all messages ingested

Every human message gets ingested into the graph. The 2,500-character threshold determines the ingestion pathway — shorter messages go through thread-based ingestion, longer ones route directly to the graph API — but nothing is dropped.

EpisodeData(data="User message from {name}: {content}", type="message")

02 — AI Responses — async via NATS

After every agent run, the post-processing consumer ingests the AI's response asynchronously — it doesn't block the user. The agent's output often contains synthesized decisions, summaries, and commitments that form important context for future conversations.

EpisodeData(data="Agent response: {content}", type="message", source="Agent response")

03 — Workflow Trigger Data — >2,500 characters

When a workflow fires (scheduled task, webhook-triggered automation), the trigger context is ingested if it exceeds 2,500 characters. This ensures the agent remembers what triggered automated actions — "the workflow ran because John's PR got approved" becomes a fact in the graph.

Every human message is ingested — the 2,500-character threshold only determines the ingestion method, not whether the message is captured.


04 — Scheduled Intelligence

Three recurring systems feed the graph with structured summaries. Each uses JSON episodes so Zep can extract entities and facts cleanly.

01 — Morning Briefing

Generated daily. The briefing summary (capped at 5,000 chars) is ingested as structured JSON. The agent remembers what it told the user this morning — "what did you brief me on?" has an answer.

{ "morning_briefing_summary": "heading + body", "briefing_date": "2025-04-15" }

02 — Evening Briefing

Same pattern, end of day. Captures what was accomplished, what got pushed, and what the agent surfaced throughout the day.

03 — Action Items

Every action item suggested to the user — from Gmail, Linear, GitHub, or Calendar — is ingested with full context. Builds a timeline of what the agent has surfaced. "What did you flag for me this week?" has an answer.

{ "action_item_suggested": "Review PR #482", "context": "Sarah flagged type safety issues", "integration": "github", "priority": "high" }

These scheduled ingestions create temporal awareness. The agent can reconstruct what happened on any given day by searching the graph for that date's briefings and action items.


05 — Integration Spatial Awareness

When a user connects an integration, the sync consumer builds a map of their workspace in the graph. The agent knows where things are before the user asks.

Slack

Workspace metadata, every channel (name, public/private, member count, topic, purpose), every user (name, email, channel memberships), and incremental membership changes. The agent knows "#eng-backend has 12 members and its topic is Q2 migration."

Gmail

Individual email episodes for relevant messages (from/to, subject, date, direction, snippet). VIP contacts (top 30 by volume) get LLM-powered relationship analysis — tone, key topics, response patterns. The graph ends up knowing "John is the CFO, prefers bullet points, responds within 2 hours."

GitHub

Installation metadata, top 100 repos (name, language, visibility, stars), teams with member lists and repo access, incremental updates via webhooks. The agent knows "auth-service is private, TypeScript, and Sarah and Mike have push access."

Linear

Org, teams, projects, users, memberships. Incremental updates via webhooks.

Airtable

Base schemas, table structures, field types. Agent knows your data model.

Vercel

Project metadata, framework, region, domains, team members.


06 — User-Initiated Memory

Three mechanisms let the user and the system explicitly write to the graph — beyond what's captured from conversations and integrations.

01 — Explicit Memory — ADD_TO_MEMORY tool

The user tells the agent to remember something: "Remember that I prefer all reports in bullet point format." The agent calls ADD_TO_MEMORY with the fact as plain text. Zep extracts the entity and fact, adds it as a node property. Future conversations reference this automatically.

EpisodeData(data="User prefers all reports in bullet point format", type="text")

02 — Onboarding User Research

During onboarding, a dedicated sub-agent researches the user on the web — LinkedIn, company pages, public profiles. The profile is chunked into 1,500-character segments and ingested. The agent knows the user's professional background from conversation one without being told.

{ "user_research_from_web": "chunk...", "chunk_info": "Part 2 of 5" }

03 — Skill Learning Summaries

When the user approves a learned skill (a reusable workflow pattern the agent discovered), the execution summary is ingested. The graph accumulates knowledge about how the user prefers to work — not just what they said, but what patterns the agent learned and the user validated.

{ "event_type": "skill_execution_summary", "skill_name": "Weekly standup report", "learning_summary": "User wants standup from Linear, formatted as bullets, posted to #eng-standup" }

These three mechanisms close the loop: conversations capture implicit knowledge, integrations capture workspace structure, and user-initiated memory captures everything else — explicit preferences, web-sourced context, and validated behavioral patterns.


07 — How Zep Builds the Graph

Zep is the memory layer service that handles entity extraction, relationship inference, deduplication, and contradiction resolution. Under the hood, it's built on Graphiti — an open-source temporal knowledge graph engine backed by Neo4j. We control what gets ingested; Zep processes every episode through a four-stage extraction pipeline.

01 Extract  02 Resolve  03 Relate  04 Temporalize

Entity Extraction

LLM receives episode + last 4 messages. Wikipedia test: "Could this have its own article?" — if no, skip. Uses most specific form ("road cycling" not "cycling"). Never extracts pronouns or abstract concepts. Each entity gets a name, summary, and 1024-dim embedding.

Entity Resolution

Two-stage dedup. First: cosine similarity + BM25 full-text against existing Neo4j nodes. Second: LLM verifies using name + summary. Confirmed duplicates merge into canonical entity. A uuid_map tracks merges so downstream edges stay consistent.

Fact Extraction

Triples: (source) —[RELATION]→ (target) + fact text. Relations in SCREAMING_SNAKE_CASE. Both entities must exist in extracted list. Self-referential triples rejected. Each edge gets a fact field + embedding for semantic search.

Temporal Resolution

ISO 8601 valid_at/invalid_at from episode context. Relative dates resolved against reference_time. Contradictions: old edge invalidated, never deleted. Four timestamps per edge: t_valid/t_invalid (real-world) + t_created/t_expired (audit). New info always wins.


08 — How the Agent Searches Memory

The agent calls MEMORY_SEARCH with 1–3 focused queries. Each fires two parallel Zep searches (edges + nodes) with cross-encoder reranking. 3 queries = 6 concurrent search calls, all with a 15-second timeout.

  MEMORY_SEARCH(queries: ["Sarah's team", "auth service owner"])
                    
  Stage 1: Multi-Method Search
  ├── Cosine Similarity (Embedding match)
  ├── BM25 Full-text (Keyword match)
  └── BFS Traversal (Graph neighbors)
                    
  Stage 2: Reranking
  Cross-encoder (0–1 sigmoid) · Threshold: 0.3 · Dedup by UUID
                    
  Stage 3: Output
  Top 10 edges + Top 6 nodes  memory_context

When the Agent Searches

  • "Email John" → Which John? What tone? Memory knows.
  • "Update the PRD" → Which PRD? Which repo? Memory has spatial context.
  • "The usual format" → Memory captured the preference from a past conversation.
  • "What did we decide?" → Memory has the decision from last week's thread.

Cache Warming

When a user opens the app, zep.user.warm() preloads their graph into cache. Latency drops from ~3s (cold) to ~400ms (warm). Fire-and-forget during session init.


This architecture powers dimension.dev's memory system — 27 ingestion points feeding a temporal knowledge graph searched in real time across every conversation.