TencentDB Agent Memory: Hierarchical Memory System Cuts Token Costs for Long‑Running AI Agents

TL;DR
TencentDB Agent Memory is an MIT‑licensed, open‑source memory system for long‑running AI agents that pairs a compact symbolic short‑term layer (Mermaid task canvases) with a four‑tier semantic long‑term pipeline (Persona → Scenario → Atom → Conversation).
Default is local‑first (SQLite + sqlite‑vec) for privacy and auditability; it plugs into OpenClaw or runs via a Hermes Agent Docker image and supports OpenAI‑compatible endpoints.
Recommended next step: run a short pilot on your most token‑heavy, long‑horizon workflow (local SQLite for privacy) and measure task success, token consumption, and latency.

Why memory matters for AI agents — and for business

AI agents that live across days, weeks or months face two business problems: runaway token costs as context windows swell, and brittle, flat memory stores that lose macro‑level structure (so the agent repeats itself or forgets important preferences). For customer success bots, sales assistants, or automated R&D helpers, that translates to poor experiences and unpredictable costs.

TencentDB Agent Memory offers a pragmatic answer: a hierarchical memory design that encourages agents to consult compact, high‑level summaries before fetching verbose logs. The result is lower token bills, more consistent personalization, and better auditability for enterprise teams.

What TencentDB Agent Memory is (quick facts)

License & repo: MIT — github.com/Tencent/TencentDB-Agent-Memory.
OpenClaw plugin: @tencentdb-agent-memory/memory-tencentdb (single npm package to try).
Hermes Agent: a Docker image bundles the memory gateway, plugin, and a DeepSeek‑V3.2 example model.
Local default backend: SQLite + sqlite‑vec (no external API required); optional managed backend: Tencent Cloud Vector Database (TCVDB).
Runtime requirements: Node.js 22.16+.

How it works — the L3→L0 memory pyramid and symbolic short‑term

The system uses a four‑tier long‑term memory plus a symbolic short‑term representation:

L3 — Persona: persistent user profile and preferences (regenerated periodically).
L2 — Scenario: condensed multi‑step storylines or workflows.
L1 — Atom: discrete facts or extracted events (extraction runs periodically).
L0 — Conversation: raw turns and recent chat history.

“Core retrieval principle: a deterministic drill‑down — from persona → scenario → atom → conversation.”

Short‑term state is compressed symbolically using Mermaid task canvases. Mermaid is a text‑based diagram language; here it acts like a compact index card the agent can read quickly. Verbose tool outputs and logs are offloaded to disk and represented as references. The agent reasons over the symbolic graph and only fetches the full raw text when it needs a specific node’s details.

“Verbose tool logs are moved to disk and state transitions are represented as Mermaid graphs so the agent reasons over symbols, then fetches raw text only when needed.”

Operationally, the agent keeps node identifiers in context. When more detail is needed it greps the node_id and retrieves the corresponding refs/*.md file. The memory artifacts are stored under ~/.openclaw/memory-tdai/ as human‑readable Markdown and JSONL, which makes auditing straightforward.

Example Mermaid snippet (simplified):
graph TD
  A[User: Request invoice] --> B[Agent: Check last payment]
  B --> C[Tool: Fetch billing log (refs/atom_123.md)]
  C --> D[Agent: Respond with status]

Hybrid retrieval and token management

Retrieval fuses BM25 keyword search (with jieba segmentation for Chinese) and embedding‑based vector search using Reciprocal Rank Fusion (RRF). That hybrid mix surfaces exact, token‑sensitive matches and robust semantic hits when phrasing varies — useful for multilingual catalogs and product descriptions.

Defaults designed for predictability: L1 extraction every 5 turns, persona regeneration roughly every 50 new memories, returning 5 candidate items with a 5‑second retrieval timeout. If retrieval times out the system skips injection rather than stalling the agent.

Integrations, tools and developer ergonomics

Quick start: npm install @tencentdb-agent-memory/memory-tencentdb, enable the plugin in OpenClaw config and run with Node.js 22.16+.
Model provider: supports any OpenAI‑compatible endpoint via MODEL_PROVIDER=custom environment switch in the Hermes example.
Runtime helpers: tdai_memory_search (search L1–L3) and tdai_conversation_search (search L0).
Storage: artifacts under ~/.openclaw/memory-tdai/ in Markdown/JSONL for white‑box debugging.

Typical file tree:
~/.openclaw/memory-tdai/
  persona.json
  scenario/
    scenario_01.md
  atom/
    atom_123.json
  conversation/
    convo_2026-05-01.md
  refs/
    atom_123.md

Benchmarks — promising, but self‑reported

“Gains are measured across continuous long‑horizon sessions — not isolated turns — to simulate real context accumulation.”

WideSearch: pass rate 33% → 50% (+51.5%); tokens 221.31M → 85.64M (−61.4%).
SWE‑bench: success 58.4% → 64.2% (+9.9%); tokens 3474.1M → 2375.4M (−33.1%).
AA‑LCR: success 44.0% → 47.5% (+7.95%); tokens 112.0M → 77.3M (−31.0%).
PersonaMem: accuracy 48% → 76% (+59%).

These results indicate meaningful token savings and improved long‑horizon success on Tencent’s internal suites. Treat them as strong signals, not definitive proof. Independent benchmarks on your workloads — with identical models, prompt templates, and session lengths — are essential before drawing ROI conclusions.

Trade‑offs, risks and mitigations for enterprise adoption

Layered memory and symbolic offload bring clear advantages, but also operational decisions:

Local vs cloud: SQLite + sqlite‑vec is great for experimentation and privacy. At scale, SQLite can hit concurrency limits; consider TCVDB or other managed vector stores for multi‑agent, multi‑user environments.
Security & governance: local files must be protected — use OS‑level encryption, file system permissions, or SQLite encryption extensions. Implement RBAC, audit trails, and retention/erasure policies for GDPR.
Persona drift & contradictions: periodic persona regeneration requires validation. Add conflict detection, human review workflows, and decay/retention policies to avoid stale or contradictory preferences.
Availability & backups: plan snapshotting and offsite backups for local deployments, or use cloud backends with replication for HA.
Observability: instrument memory growth, retrieval latency, cache hit rates, persona regeneration frequency and token consumption per session.

Security checklist

Encryption at rest (disk or SQLite‑level), and encrypted backups.
RBAC and least privilege for read/write to memory files.
Integrity checks / tamper evidence for memory artifacts (hashes, versions).
Retention and forget workflows mapped to compliance requirements.
Secrets management for any managed vector DB credentials.

Practical pilot plan and metrics to track

Recommended short pilot (2–6 weeks depending on session cadence): run parallel agents on an identical long‑running workflow — one with flat vector memory and one with TencentDB Agent Memory. Track these metrics:

Task success rate over continuous sessions (end‑to‑end completion).
Token consumption per session and per successful task (cost delta).
Average retrieval latency and timeout rate.
Recall quality: precision@k and human quality ratings for retrieved memories.
Storage growth rate and backup size.
Number of persona regenerations and detected contradictions.

Suggested A/B design: route 50% of long‑horizon sessions to a control agent (flat vector store) and 50% to the hierarchical memory agent, matched on user type and workflow. Minimum pilot duration should capture typical session lifecycles — for workflows that span months, a representative synthetic workload or accelerated cadence can be useful.

Quick start (developer note)

One‑line install:

npm install @tencentdb-agent-memory/memory-tencentdb

Then enable the plugin in your OpenClaw configuration and run with Node.js 22.16+. The Hermes Docker image shows an end‑to‑end example and supports MODEL_PROVIDER=custom to point to any OpenAI‑compatible endpoint.

Roadmap items and what to watch

Portable memory formats for easier migration between environments (useful if you later switch vector DB vendors).
Automatic Skill generation to convert recurrent memories into reusable actions or abilities for agents.
Visual debugging dashboard to make those Mermaid canvases actionable for product and compliance teams.

Final recommendations — what leaders and builders should do next

Run a focused pilot on your most token‑heavy, long‑horizon workflow using the local SQLite option to validate privacy and token savings.
Instrument the pilot for the metrics above and compare against a flat vector baseline; prioritize task success and cost per completed task.
If results look good, plan for scale: add encryption, RBAC, backups, and consider a managed vector store for high concurrency or multi‑region availability.

TencentDB Agent Memory is a practical blueprint: the hierarchical Persona→Scenario→Atom→Conversation pattern and symbolic task canvases are reproducible design choices you can apply even if you don’t adopt the codebase wholesale. The real question for businesses is whether your agents will learn to remember the right things — and whether that remembering saves you money and improves outcomes. Start small, measure, and let the metrics decide.