Amazon Bedrock AgentCore: How to Build Secure, Multi‑Tenant AI Agents for SaaS

Building multi-tenant agents with Amazon Bedrock AgentCore

You built a demo where an LLM calls a few APIs and wows a product demo. Now you must ship that as a secure, auditable SaaS capability that serves dozens—or thousands—of customers with different security, privacy, and cost requirements. Three things compete: tenant isolation, per-tenant cost, and operational simplicity. Amazon Bedrock AgentCore presents a managed, serverless set of primitives—Runtime, Gateway, Memory, Identity, Policy, Observability, and Guardrails—that help teams move agentic AI from prototype to production while letting you choose the right tenancy model at each layer.

Running example: DocAssist is a fictional legal research SaaS that adds an assistant to summarize contracts, search precedent, and call downstream billing and case-management tools. Some DocAssist customers are small law firms with low regulatory burden; others are enterprise finance teams that require strict data residency and audit controls. The choices below use DocAssist to make trade-offs concrete.

Glossary (quick)

RAG — Retrieval-Augmented Generation (using external knowledge bases + LLMs)
ABAC — Attribute-Based Access Control
microVMs — lightweight virtual machines providing stronger isolation than containers with lower cost/latency than full VMs
ANS v2 — Agent Naming Service v2 (IETF draft for layered agent identity assurances)
OAuth 2.0 on-behalf-of — token exchange pattern (RFC 8693) for delegation/act-on-behalf

Three tenancy patterns for multi-tenant agents

Tenancy is not a binary choice. AgentCore maps common multi-tenant needs to three patterns—choose per layer rather than locking the whole architecture into one model.

Silo (dedicated). Per-tenant resources (private runtimes, model instances, or vector DBs). Highest isolation and simplest compliance path—ideal for DocAssist’s enterprise clients with regulatory needs—but more expensive and operationally heavier.
Pool (shared). Shared compute and storage with logical separation via namespaces, metadata filtering, and ABAC. Cost-efficient and scalable for small/medium tenants. Requires rigorous metadata hygiene and runtime filters to avoid leakage.
Bridge (hybrid). Mix-and-match: pool compute for most tenants, silo RAG stores for regulated tenants. This pragmatic approach lets DocAssist offer both cost-effective service tiers and strict isolation where required.

Core AgentCore primitives and where to apply tenancy

Think of AgentCore as a toolbox. Apply a different tenancy model to each tool depending on risk and cost. Below are the layer-level trade-offs and practical guidance.

Runtime: session-isolated microVMs

AgentCore Runtime runs agents in session-isolated microVMs, which provide per-session separation without the full cost of dedicated VMs. Session microVMs are like private rooms in a co‑working space—cheaper than building a whole office per tenant, but more isolated than sharing a common table.

When to silo: high-risk tenants where a noisy neighbor or cross-tenant data retention is unacceptable. DocAssist might give enterprise clients dedicated runtime pools.
When to pool: most small/medium tenants. Pooling with per-session isolation and strict namespace enforcement keeps costs down.
Operational knobs: session concurrency limits per tenant, warm-pool sizing vs cold-start latency, and limits on tool invocation rates.

“Session‑isolated microVMs offer an isolation sweet spot: near-dedicated separation without full VM cost or latency.”

Identity and act-on-behalf (token exchange)

Every request needs tenant and security context. AgentCore propagates that via custom HTTP headers and JWTs that include tenant, request, and security metadata. For downstream API calls, use delegation (act‑on‑behalf) with OAuth 2.0 token exchange—avoid granting agents full user credentials.

Pattern: agent receives a scoped service token, then exchanges it for narrowly scoped downstream credentials (RFC 8693). That token exchange limits privilege escalation and simplifies audit trails.
Integration: connect AgentCore Identity to corporate IdPs (Okta, Microsoft Entra, Amazon Cognito) and issue short-lived workload identities for agents and tools.
Operational challenges: token-scope proliferation, latency of token exchange, and complexity of tracing exchanged tokens back to originating requests.

“Delegation (act‑on‑behalf) that issues scoped tokens per boundary is safer than letting agents run with full user credentials.”

Memory and RAG: hierarchical namespaces and vector DB strategies

AgentCore Memory uses hierarchical namespaces—Global, Strategy, Tenant, User, Session—and enforces ABAC and namespace filtering. That lets you centralize shared strategies while isolating tenant or user memory as needed.

Two RAG options:

Per-tenant vector DBs (silo). Strongest data isolation and simplest compliance; higher storage/maintenance cost. Use this for DocAssist enterprise customers that store sensitive contracts.
Shared vector DB with metadata/namespace filtering (pool). Cost-effective and performant if you can enforce correct metadata and query-time filters. Works well for public content or non-sensitive knowledge bases.

Enforcement must be at query-time: never trust client-supplied filters alone. Implement server-side ABAC checks and namespace-aware query planners.

Policy, guardrails, and content safety

AgentCore Policy evaluates requests and intercepts tool invocations, letting you author rules in natural language or Cedar-like policy syntax. Guardrails handle input sanitization, prompt-injection detection, and post-generation checks for hallucinations and sensitive-data leaks. Amazon Bedrock Guardrails provides configurable safety policies that plug into the Gateway and Runtime.

Design rules that prevent risky tool invocations (e.g., deletion or bulk exports) unless the agent has proper scope.
Run prompt-injection red-team tests and fuzz tool arguments to find edge cases where policy bypass is possible.
Combine static policy checks with dynamic runtime attestations (telemetry-based behavioral checks).

Observability and cost attribution

Instrument every agent step with OpenTelemetry-compatible traces and tenant-tagged metrics exported to CloudWatch (or your APM). For meaningful chargeback, collect tokens consumed by models, embedding calls, vector DB queries, runtime CPU-seconds, tool calls, latency P95, and error rates.

Essential telemetry events: model_call.start, model_call.end (tokens_in/out), embedding.create, rag.query, tool.invoke.start/end, runtime.session.start/end.
Cost levers to track: tokens-per-response, average embeddings per query, storage GB for vectors, and runtime CPU-seconds.
Alerting: anomalous token spikes, unexpected downstream tool invocations, or cross-tenant query patterns.

Agent identity, trust, and discovery

Identity says who an agent is; trust needs multi-signal evaluation; discovery finds the right agent for a job. Use AgentCore Identity and an internal Agent Registry for cataloging agents and skills. For cross-org assurance, follow ANS v2 concepts: cryptographic identity + transparency logs + discovery metadata.

“Identity proves who an agent is; trust needs multi‑signal evaluation; discovery finds the right agent for the job.”

Model and knowledge strategies

Model choices affect cost, capability, and governance:

Shared models. Cheapest to operate; fine for non-sensitive tasks or lower-tier customers.
Tiered models. Offer different performance and safety guarantees by subscription tier (e.g., small tenants use a pooled model; enterprise customers get dedicated or fine‑tuned models).
Fine-tuned or private models. Required when training on tenant-proprietary data or meeting strict regulatory controls—higher cost and operational overhead.

RAG often complements model strategy: keep sensitive knowledge in silos while using pooled models to run generation against tenant-specific vectors where allowed.

Decision checklist and trade-offs

Use this short checklist as you map DocAssist’s architecture.

Classify tenant sensitivity (low/medium/high) and compliance needs.
For each layer (Runtime, Memory/RAG, Models, Identity, Observability), choose silo/pool/bridge and document SLA/limits.
Define token-exchange flows and scope limits; test for latency and failure modes.
Instrument telemetry with tenant tags; model cost drivers and build chargeback paths.
Run security tests: prompt injection, tool-arg fuzzing, token-replay, and ABAC regression suites.
Plan for key rotation and data residency (KMS per-tenant where needed).

Quick trade-off summary (per component)

Runtime — Pool: low cost, simple; Silo: highest isolation, expensive.
Memory/RAG — Pool: cost-efficient but risk of leakage; Silo: safer for sensitive data.
Models — Shared: cheap; Tiered/Fine‑tuned: better privacy/performance but costlier.
Observability — Always require tenant-tagged metrics, regardless of tenancy model.
Identity — Use delegation/token-exchange; do not hand agents full user credentials.

Operational playbook and common pitfalls

Pitfall: metadata hygiene failure. Shared vector stores require consistent, enforced metadata. A missing tenant tag can lead to data leakage—treat tagging as a security control, not an optional label.
Pitfall: token-scope explosion. Too many narrow scopes create management overhead. Establish sensible scope templates and lifecycle policies.
Pitfall: observability gaps. If you can’t tie tokens or model calls back to tenants, chargeback and incident triage fail. Instrument early.
Vendor lock-in risk. AgentCore and Bedrock simplify building multi-tenant agents but consider migration and cross-cloud strategies if you need portability.

Testing and readiness steps for DocAssist:

Build two tenant profiles (small firm vs regulated enterprise) and implement divergent tenancy choices to validate costs and isolation.
Run a prompt-injection red-team and service-level chaos tests that simulate token expiry and token-exchange failures.
Simulate billing by tracking tokens and embedding usage for a representative workload to understand per-tenant economics.

Key takeaways and questions

How should I pick between silo, pool, or bridge tenancy?

Base the choice on tenant sensitivity, compliance, and cost targets — silo for regulated or high-value tenants, pool for cost-efficiency with strict ABAC and metadata controls, and bridge for mixed portfolios.

Can runtime isolation be both secure and cost-effective?

Yes. Session-isolated microVMs provide a practical compromise: strong per-session separation without the full operational cost of per-tenant VMs. Tune pool sizes and concurrency limits to balance cost and latency.

Should agents run with user credentials or delegated tokens?

Use act-on-behalf token exchange (OAuth 2.0 on-behalf-of) to issue short-lived, narrowly-scoped tokens for downstream calls rather than full impersonation. It reduces blast radius and improves auditability.

How do I prevent RAG data leakage between tenants?

Either silo vector stores for strict isolation or use a shared store with enforced metadata tagging, server-side namespace filtering, and ABAC at query time. Treat metadata as a security control.

What’s required for meaningful observability and cost attribution?

Instrument everything with OpenTelemetry traces and tenant-tagged metrics (model tokens, embeddings, vector queries, runtime CPU-seconds, tool calls). Export to a centralized system like CloudWatch for chargeback and anomaly detection.

Implementation checklist (first 30–90 days)

Classify tenant profiles and map each layer to silo/pool/bridge.
Wire AgentCore Identity and implement OAuth on-behalf-of token exchange flows for tool calls.
Choose RAG strategy: per-tenant vectors for sensitive customers, pooled vectors with ABAC for the rest.
Enable session-isolated microVMs for runtimes; define concurrency policies and warm-pool settings.
Instrument OpenTelemetry traces and tenant-tagged metrics; export to CloudWatch and test chargeback reports.
Author and test policy rules and guardrails; run prompt‑injection and tool-arg fuzz tests.
Document incident response: token compromise, cross-tenant query detection, and data-exfiltration playbooks.