Tencent's 4-tier agent memory: SWE-bench 58.4 to 64.2%, SQLite local, MIT

Tencent dropped a serious agent-memory release under MIT this week — TencentDB Agent Memory, a 4-tier pyramid that solves the long-horizon context-bloat problem most agent shops are still hand-rolling around. The shape: L0 raw conversation logs at the base, L1 atomic facts in JSONL, L2 scenario blocks in Markdown, L3 user persona in Markdown at the top. Upper tiers preserve structure, lower tiers preserve evidence, and every retrieval comes with a `node_id` + `result_ref` so the agent can drill down deterministically when the persona-level fact isn't enough. For anyone shipping agents that run for more than a handful of turns, this is the cleanest published architecture for the memory problem to date.

Numbers on continuous long-horizon sessions (not single-turn lookups, which is the right benchmark): SWE-bench 58.4% → 64.2% with the plugin enabled (+9.9% relative), token use down 33%. WideSearch 33% → 50% (+51.5%), tokens down 61%. AA-LCR 44.0% → 47.5%, tokens down 31%. PersonaMem 48% → 76% (+59%). Defaults: SQLite with sqlite-vec extension, zero external API dependency, Markdown files at `~/.openclaw/memory-tdai/`. Recall has a 5-second timeout and on timeout the system skips injection rather than blocking — so a slow retrieval can't stall an agent loop. Hybrid BM25 + vector via Reciprocal Rank Fusion, top-5 by default. L1 atomic-fact extraction runs every 5 turns; persona regeneration every 50 new memories.

The ecosystem read: Mem0, Letta, MemGPT and Zep have been carving up agent-memory for two years now, but the 4-tier breakdown is the architectural delta. Most existing systems either flatten everything to a vector index (Mem0, Zep) or maintain a hot/cold split (MemGPT). The pyramid approach gives you Persona-as-Markdown (auditable by the user, human-readable, easy to edit), Atomic-as-JSONL (structured, parseable, drill-down keys), and raw logs at the floor. That's a white-box memory system you can debug with `grep`. Tencent left the head-to-head benchmarks against Mem0/Letta/MemGPT/Zep out of the release — flag the asterisk — but the SWE-bench delta with a 33% token cut is the kind of number that survives reproduction. Repository: github.com/Tencent/TencentDB-Agent-Memory.

Monday morning: the integration story is currently locked to Tencent's OpenClaw (single npm package `@tencentdb-agent-memory/memory-tencentdb`, needs Node.js 22.16+) or Hermes Agent (Docker only). LangChain and LlamaIndex bindings are not in v1 — that's the obvious community gap if you want to use the 4-tier architecture under your existing harness. If your agent burns tokens replaying conversation history on every turn and you've been waiting for a published baseline before building your own memory system, clone the repo, read the L0→L3 schema, and decide whether to wrap it or reimplement the architecture against your stack. The benchmark numbers are credible; the integration cost is the trade.

Tencent's 4-tier agent memory: SWE-bench 58.4 to 64.2%, SQLite local, MIT

More News