LiteLLM Agent Platform — Self‑Hosted Orchestration for Stateful AI Agents
TL;DR: LiteLLM Agent Platform is an open‑source, self‑hosted orchestration layer for running stateful AI agents in production—providing per‑session sandboxes, durable session persistence via Postgres, and a developer workflow that scales from local kind clusters to AWS EKS. It’s an MIT‑licensed alpha released by BerriAI on May 8, 2026.
- What it solves: session continuity, per‑team isolation, and production lifecycle for AI agents.
- How it works: a Next.js dashboard + worker, Postgres persistence, and Kubernetes sandboxes managed via the kubernetes‑sigs/agent‑sandbox CRD.
- Who should pilot it: platform teams in regulated industries, enterprises that require data residency, and engineering orgs ready for Kubernetes ops.
Why stateful AI agents break prototypes
Prototype agents on a laptop hide three hard truths you face in production: agents keep state (conversation history, tool outputs, intermediate reasoning), they need strict isolation when multiple teams or tenants use them, and they must survive infrastructure churn—pod restarts, upgrades, and autoscaling. Without session persistence and per‑session sandboxes, agents can lose context, leak credentials, or become a compliance risk.
How LiteLLM Agent Platform addresses those problems
The platform separates model routing and execution orchestration. LiteLLM Gateway stays responsible for model routing, provider integrations, cost tracking, rate limiting and guardrails, while the Agent Platform manages sandbox lifecycle, session continuity and developer UX.
“The platform provides per‑team and per‑context sandboxes plus session continuity across pod restarts and upgrades.”
Key components and architecture:
- Web dashboard: Next.js (TypeScript) UI for session and team management, developer workflows and observability.
- Worker: an async TypeScript process coordinating sandbox provisioning, lifecycle events and background tasks.
- Persistence: Postgres backs sessions, agent configs and metadata; schema migrations run as an init container.
- Sandbox cluster: Kubernetes‑managed per‑session sandboxes provisioned with the kubernetes‑sigs/agent‑sandbox CRD. Local development uses kind; production recommends AWS EKS.
- Gateway integration: consumes a running LiteLLM Gateway for model calls and telemetry; the Gateway supports 100+ providers like OpenAI, Anthropic, Bedrock and Vertex AI.
- Harnesses: a harness system (e.g., opencode) for coding‑agent runtimes such as Claude Code and OpenAI Codex—extensible to custom tools and runtimes.
Local quickstart intentionally keeps friction low: run bin/kind-up.sh then docker compose up and you can evaluate sandboxes without cloud credentials. Production guidance includes a provided bin/eks-up.sh script for EKS provisioning and a Render blueprint for one‑click web/worker hosting.
“Runs entirely on your own infrastructure—no data leaves your environment—suitable for regulated industries and data residency requirements.”
The project is open‑source on GitHub (github.com/BerriAI/litellm-agent-platform), MIT‑licensed and currently an alpha public preview—expect rapid iteration and use it for pilots and evaluations first.
Security, secrets and compliance—practical guidance
Secret injection is pragmatic: environment variables prefixed with CONTAINER_ENV_ are injected into sandbox containers with the prefix stripped (for example, CONTAINER_ENV_GITHUB_TOKEN becomes GITHUB_TOKEN inside the sandbox). That makes tooling straightforward but requires careful operational controls.
- Use an external secret manager: integrate Vault or cloud KMS for credential rotation and short‑lived tokens rather than embedding long‑lived credentials.
- Least privilege: scope sandbox identities so they only access internal services required for the session; avoid broad cluster roles.
- Network & runtime controls: apply strict network policies and Pod Security admission controls to limit outbound access and filesystem reachability.
- Audit and attest: enable logging and immutable audit trails for sandbox lifecycle events, secret usage, and model calls through the LiteLLM Gateway.
Operational trade‑offs and scaling
Self‑hosting buys control and compliance but adds operational cost. Running EKS clusters and managing custom resource definitions (CRDs) requires platform engineering investment. Key cost drivers and operational considerations:
- Sandbox count and size: number of concurrent sandboxes and memory/CPU per sandbox directly affect cluster size and costs.
- Model inference: Gateway routing to paid LLM providers drives per‑call costs; track cost per session and per‑tool usage.
- Postgres capacity: session retention, number of connections and HA requirements determine sizing and backup strategy.
- Observability: collect sandbox lifecycle metrics, session latency, Gateway model call traces, and errors. Prometheus + Grafana and OpenTelemetry are practical starting points.
- Scaling patterns: consider sandbox pooling or warm sandboxes for low latency, and autoscaling for bursty loads. Test how session reconnection behaves under node churn.
Pilot checklist — a runnable evaluation plan
- Clone the repo and read README and k8s‑backend.md.
- Run local quickstart: bin/kind-up.sh then docker compose up. Confirm UI on port 3000 and worker processes start.
- Create a minimal agent that calls a mocked internal API or test tool; verify tool outputs are recorded to Postgres.
- Simulate a pod restart: delete a sandbox pod and confirm the session reconnects and history is restored from Postgres.
- Test secret injection: set CONTAINER_ENV_SAMPLE_SECRET and confirm it appears inside the sandbox; then rotate or remove it and verify behavior.
- Run a small load test (10–50 concurrent sessions) and record CPU/memory per sandbox, Postgres connections, and Gateway call rates.
- Enable basic observability: expose metrics, collect sandbox lifecycle events, create alerts for failed reconnections or high session latency.
- Verify Postgres backup and restore by restoring a session dataset to a test environment.
- Review RBAC, network policies and run a light security audit or pen test focused on sandbox escape vectors.
Self‑hosted vs hosted for agent orchestration
- Self‑hosted (LiteLLM Agent Platform): Pros — full data control, compliance fit, integration with internal services; Cons — EKS/CRD ops, security expertise required, alpha project churn.
- Hosted / managed offerings: Pros — lower operator burden, SLAs and managed scale; Cons — data residency limits, vendor lock‑in, reduced control over runtime behavior.
Short FAQ
Can the platform keep agent sessions intact across pod restarts?
Yes. Sessions are persisted to Postgres and the platform reattaches sessions to new sandboxes after pod restarts, preserving conversation history and tool outputs.
How are agents isolated by team or context?
Per‑session sandboxes are provisioned via the kubernetes‑sigs/agent‑sandbox CRD, creating isolated execution contexts to reduce cross‑tenant blast radius.
Do I need cloud credentials to evaluate locally?
No. The local quickstart uses kind (Kubernetes‑in‑Docker) and Docker Compose—no cloud credentials required for initial evaluation.
Is this suitable for regulated or data‑residency‑sensitive deployments?
Yes. Running the platform on your infrastructure keeps data in‑house, but you must validate isolation, secret management and audit controls before trusting production workloads.
Next steps
For engineering and platform leaders evaluating agent orchestration, a short pilot using the checklist above will surface operational costs, scaling behavior and security posture quickly. Start with the local quickstart, exercise session persistence and secret injection, then expand to an EKS pilot if results are promising.
Repository and docs: github.com/BerriAI/litellm-agent-platform. The project is MIT‑licensed and currently in alpha—contributions and issues are welcome on GitHub.