Memory System
OpenClaw has two memory layers with very different characteristics. The first is static memory — the MEMORY.md file read from disk on every turn. The second is semantic memory search — a vector + full-text index over workspace files and session transcripts. Understanding when each layer activates and what it costs is essential for tuning agent behaviour.
Static Memory: MEMORY.md
Section titled “Static Memory: MEMORY.md”MEMORY.md is loaded as part of the standard workspace bootstrap (position 70 in the sort order — see Chapter 2). Its content is injected verbatim into the system prompt under the ## Long-term Memory heading, after all other bootstrap files.
The canonical filename is enforced in src/memory/root-memory-files.ts:
export const CANONICAL_ROOT_MEMORY_FILENAME = "MEMORY.md";export const LEGACY_ROOT_MEMORY_FILENAME = "memory.md";If the legacy memory.md is present, it is explicitly excluded from the auxiliary memory paths (shouldSkipRootMemoryAuxiliaryPath) to avoid double-injection. The 2 MB read cap (MAX_WORKSPACE_BOOTSTRAP_FILE_BYTES) applies — larger files are truncated before injection.
The memory/ subdirectory is not automatically scanned. Daily note files like memory/2026-04-26.md are only included if the agent explicitly lists them as extra context paths in the memory config, or if the memory search backend indexes them.
Memory Search: Configuration
Section titled “Memory Search: Configuration”Memory search is configured per-agent under memory in the config. The full resolved configuration type is ResolvedMemorySearchConfig (in src/agents/memory-search.ts):
type ResolvedMemorySearchConfig = { enabled: boolean; sources: Array<"memory" | "sessions">; extraPaths: string[]; provider: string; // "builtin" | "qmd" | "remote" store: { driver: "sqlite"; path: string; fts: { tokenizer: "unicode61" | "trigram" }; vector: { enabled: boolean; extensionPath?: string }; }; query: { maxResults: number; minScore: number; hybrid: { enabled: boolean; vectorWeight: number; textWeight: number; ... }; }; sync: { onSessionStart: boolean; onSearch: boolean; watch: boolean; ... }; // ...};Two backends are supported:
builtin— SQLite with optionalsqlite-vecextension for vector search, running in-processqmd— a separate sidecar process (qmd-process.ts) communicating over a JSONL socket
Backend 1: Builtin SQLite
Section titled “Backend 1: Builtin SQLite”The builtin backend is the default. It stores embeddings and full-text index entries in a SQLite database at store.path (default: <stateDir>/memory/<agentId>.db).
When store.vector.enabled is true, the sqlite-vec extension is loaded and vector similarity search is performed. When the extension is unavailable, the backend falls back to BM25 full-text search only.
Hybrid search weights are configured via query.hybrid: vectorWeight and textWeight control how vector and text scores are combined. The mmr (Maximal Marginal Relevance) sub-section controls result diversity — it penalises redundant results that are semantically similar to already-selected ones.
Embedding computation uses the configured provider. The "local" embedding path uses node-llama.ts to run an ONNX embedding model in-process. Remote providers (OpenAI, Cohere, etc.) are called via embeddings-remote-fetch.ts.
Backend 2: QMD
Section titled “Backend 2: QMD”The QMD backend (src/memory-host-sdk/engine-qmd.ts) launches a separate sidecar that handles all embedding and indexing work. This isolates the memory system from the gateway process and avoids ONNX runtime conflicts. It communicates via a JSONL socket (qmd-process.ts), with operations batched for throughput.
QMD is activated when memory.backend: "qmd" is set in the config. At gateway startup, startGatewayMemoryBackend (in server-startup-memory.ts) iterates all configured agents and warms up their QMD instances.
The QMD engine also owns the dreaming subsystem (dreaming.ts): background consolidation passes that run while the agent is idle, de-duplicating and re-ranking stored memories without interrupting active turns.
How Memory is Indexed
Section titled “How Memory is Indexed”The sync system (src/memory-host-sdk/runtime-files.ts) determines what gets indexed:
MEMORY.md— always indexed if thememorysource is enabledmemory/*.mdfiles — discovered by recursive scan if the workspace is watched- Session transcripts — indexed as chunks when
sourcesincludes"sessions", subject tosync.sessions.deltaBytesandsync.sessions.deltaMessagesthresholds
Files are chunked by chunking.tokens (default ~512 tokens) with chunking.overlap between adjacent chunks. Each chunk is embedded independently so that precise paragraph-level retrieval is possible.
Sync can happen:
sync.onSessionStart: true— re-index changed files at the start of each turnsync.onSearch: true— index on-demand when a search is triggeredsync.watch: true— watch the workspace withfs.watchand index on change (with debounce)
The memory_search Tool
Section titled “The memory_search Tool”The memory_search tool (registered in src/agents/openclaw-tools.ts as part of the standard tool set) lets the agent query the memory index from inside a turn. The underlying call path:
memory_search tool → agent's memory search manager → hybrid query (BM25 + vector) → citation results with scores → injected into context as markdownThe tool returns citations in the format: [Source: path] score=0.85\n<excerpt>. The agent can use these citations to ground its reasoning in specific past decisions or knowledge.
Context Injection of Search Results
Section titled “Context Injection of Search Results”When sync.onSessionStart is enabled, memory search results are pre-fetched before the LLM call and injected into the system prompt in the ## Memory Search Results block. The injection happens in buildMemoryPromptSection (in src/plugins/memory-state.ts).
The MemoryCitationsMode config (citations.mode) controls whether citations include source paths and scores ("full"), just excerpts ("inline"), or nothing ("off").
Session Memory
Section titled “Session Memory”When sources includes "sessions", the memory backend indexes the agent’s own conversation history. This gives the agent long-term recall of past conversations beyond the compaction window. Session chunks are indexed with recency weighting — more recent turns receive higher base scores before the semantic similarity component is applied.
The experimental.sessionMemory flag gates this feature because session indexing has higher storage and compute costs than pure workspace file indexing.
Multimodal Memory
Section titled “Multimodal Memory”src/memory-host-sdk/multimodal.ts extends the memory system to index image content. When memory.multimodal.enabled is true, images in the workspace (and optionally in session history) are embedded using a vision model. Searches can then return image chunks alongside text chunks. This enables the agent to recall visual information — diagrams, screenshots, charts — from its workspace.
Key Takeaways
Section titled “Key Takeaways”MEMORY.mdis static: read from disk, injected as-is into the system prompt on every turn- The memory search backend (builtin SQLite or QMD) provides semantic search over workspace files and session transcripts
- Hybrid search combines BM25 full-text and vector similarity with configurable weights
- QMD runs as a sidecar to avoid in-process ONNX conflicts and enables the dreaming consolidation subsystem
- Session memory allows recall of past conversations beyond the compaction window
- Pre-fetched search results are injected into the system prompt before the LLM call; on-demand search is available via the
memory_searchtool