How Thomson Reuters Built an Agentic Platform Engineering Hub Using Amazon Bedrock AgentCore
Executive summary
Thomson Reuters automated routine cloud operations by building an agentic platform on Amazon Bedrock AgentCore. Aether, the orchestrator, coordinates specialized AI agents while TRACK (TR-AgentCore-Kit) accelerates safe agent development. Governance is enforced through an agent-to-agent (A2A) registry and multi-party human approvals. Company-reported outcomes: about 70% automation at launch and an approximate ~15x productivity improvement on repetitive tasks. Below are the architecture, rollout choices, security controls, measurable outcomes, and a practical playbook for platform leaders.
The problem: repetitive cloud ops that waste engineering time
Engineers spend hours answering the same questions and executing identical processes across teams: account provisioning, database patching, runbook execution, architecture checks. Those repetitive tasks are time-consuming, error-prone, and divert talent from higher-value work like product features and platform design. Thomson Reuters chose to stop asking humans to be the repeatable mechanism for these chores.
Naveen Pollamreddi observed that engineers were spending a large portion of time answering the same questions and running identical processes—TR needed an automated approach that still met security and compliance.
Architecture at a glance
High-level components and their roles:
- Amazon Bedrock AgentCore: managed agent runtime that runs agents, handles tool connectivity, and provides memory services.
- Aether (orchestrator): the traffic controller that routes requests, preserves conversational context, and invokes service agents.
- TRACK (TR-AgentCore-Kit): a developer kit that packages, tests, and deploys agents with organizational defaults for compliance and CI/CD.
- A2A registry: an agent-to-agent discovery and governance registry implemented with DynamoDB + API Gateway; stores versions and enforces production gates.
- Aether Greenlight: a multi-party human-in-the-loop approval system that records audit trails before executing high-risk actions.
- AgentCore Memory: short-term conversation state and longer-term user/team preferences to make agents context-aware and consistent.
- Developer UI: React portal with enterprise SSO as the self-service front door for engineers.
How it actually works
Aether receives a conversational request (typed or API). It uses AgentCore Memory to preserve context, consults the A2A registry to discover the right service agent, and routes the task to that agent via AgentCore Runtime. For low-risk jobs, the service agent executes end-to-end. For sensitive changes, Aether Greenlight pauses execution and routes approvals to the relevant humans. Once approved, the orchestrator resumes and completes the task while logging every step for audit and rollback.
TRACK: a developer path from prototype to production
TRACK wraps AgentCore starter code with company policies, deployment templates, testing scaffolding, and registration hooks for the A2A registry. That makes it faster and safer for teams to create new agents while ensuring they follow common patterns (least privilege, secrets handling, telemetry). It converts experimentation into repeatable production artifacts.
Registry and cross-account agent-to-agent calls
The A2A registry provides discovery, version history, and cross-account permissioning. New agent versions cannot be promoted to production without passing ISRM (information security, risk, and management) gates and human validation—avoiding runaway automation with elevated privileges.
Memory and the conversational advantage
AgentCore Memory is used both for ephemeral conversation state (so Aether can carry context between steps) and for persistent preferences (team defaults, prior decisions). This reduces repetitive questioning and improves consistency across runs and handoffs.
Security, governance and auditability
Agentic automation amplifies both value and risk. Thomson Reuters designed governance as first-class functionality:
- Registry-based approvals: every agent version requires registration and human sign-off before production access.
- Least-privilege identities: agents run with scoped IAM roles and tokenized credentials; cross-account calls use short-lived credentials and explicit permissions.
- Secrets management: secrets are stored in centralized vaults or Secrets Manager and injected at runtime; agents are never hard-coded with credentials.
- Audit trails and logging: every orchestrator decision, agent call, approval, and execution step is logged. Logs feed SIEM and CloudTrail to support incident forensics and compliance reports.
- Human-in-the-loop enforcement: Greenlight enforces multi-party approvals (MFA, signed attestations) for high-risk actions, and records the approval chain for regulatory audits.
- Canarying and rollback: new agents or versions are canaried in test accounts, promoted via the registry, and remain subject to rollback triggers and SLOs.
Rollout, developer lifecycle and reliability
Practical rollout patterns matter as much as the tech. Typical phases used:
- Pilot (4–8 weeks): pick one high-repeat, low-risk use-case (e.g., account provisioning) and validate automation, approvals, and telemetry.
- Expand (2–4 months): add adjacent tasks like patching and runbook execution, and refine TRACK templates and tests.
- Scale (ongoing): invite more teams, enforce registry policies, and tune observability and SLOs.
Agent development follows a CI/CD lifecycle: unit and integration tests, AI-behavior safety tests (prompt/response edge cases), canary deployments, and staged promotion via the registry. Observability includes tracing (request path across Aether and agents), metrics (automation rate, mean time to provision), and alerting for failed or risky workflows.
Outcomes and ROI
Thomson Reuters reported dramatic early results: an approximate ~15x productivity improvement for routine tasks and roughly 70% automation at launch (company-reported). Those numbers reflect fewer manual steps, faster provisioning times, and a reduction in repetitive human effort, freeing engineers for architecture and product work.
Beyond raw metrics, the platform delivered:
- Consistent runbook execution and reduced human error.
- Faster time-to-value for internal teams using self-service workflows.
- Better cost efficiency from automated, policy-driven defaults (e.g., patching cadence, resource templates).
- Improved developer experience through standardized patterns and a clear path from prototype to production.
Human vignette: before and after
Before: a cloud engineer spent an hour on a single account request—collecting approvals via email, manually creating roles, and running patch steps. After: the engineer opens the portal, requests the account, Aether routes approval, a service agent sets up roles with least privilege, patches are scheduled, and a completion notification arrives within minutes. The engineer now spends their time improving the onboarding experience rather than executing it.
Lessons learned and counterpoints
Key lessons and realistic caveats:
- Start with governance: trust is earned. Require approvals for production-facing agents first; you can relax constraints later.
- Measure carefully: productivity multipliers depend on baseline definitions. Track hours saved, incident counts, MTTR, and automation coverage separately.
- Watch for failure modes: agents can misinterpret tool outputs or chase incorrect remediation loops. Implement fallback manual paths and fast rollback mechanisms.
- Vendor lock-in trade-offs: managed runtimes like AgentCore speed delivery but increase dependency on the provider. Mitigate with clear abstractions, exportable registries, and well-documented agent code.
- Operational costs: agent runtimes, memory stores, and registries cost money. Factor runtime cost into ROI and continuously optimize agent efficiency.
- People and change management: adoption requires developer enablement, documentation, and a governance council to arbitrate policies and disputes.
Playbook: 6 steps to start your own agentic platform
- Identify the high-frequency ops: pick one workflow that chews up engineering hours and is safe to automate.
- Choose a managed runtime: use a managed agent platform to reduce infra burden and accelerate time-to-production.
- Build a developer kit: create a TRACK-style starter that enforces templates, tests, and security defaults.
- Implement a registry and approval gates: require human sign-off for production agents and maintain version history.
- Instrument and observe: add tracing, SLOs, dashboards for automation rate, failures, and cost.
- Iterate and scale: canary new agents, capture lessons, and broaden the catalog while keeping governance tight.
KPIs to track in months 0–6
- Percent of repeatable ops automated
- Engineer-hours saved per month
- Mean time to provision (before vs after)
- Number of agent-caused incidents and mean time to recover
- Approval turnaround time for human-in-the-loop checks
- Runtime cost per automated workflow
Frequently asked questions
-
What is agentic automation for platform engineering?
Agentic automation uses AI agents (conversational programs that can call tools) and an orchestrator to autonomously complete operational workflows while preserving compliance, audit, and human approvals where needed.
-
Why use a managed runtime like Amazon Bedrock AgentCore?
Managed runtimes remove undifferentiated infrastructure work (runtime, scaling, memory services), letting teams focus on agent behavior, governance, and integrations—accelerating time-to-production.
-
When should human-in-the-loop be enforced?
Use human approvals for operations that change security posture, access controls, or production state. Relax approvals for low-risk, repeatable tasks after you’ve built trust and observability.
-
How portable is this pattern across clouds and regulated environments?
The pattern—orchestrator + service agents + registry + memory + human approvals—is portable. Implementation details (identity models, logging, encryption) must be adapted to specific regulatory and cloud requirements.
-
How do you avoid runaway automation?
Enforce strict permissions, registry validation, canary deployments, and human approval gates. Maintain fail-safe manual processes that agents can fall back to when uncertain.
My take: build trust before speed. It’s easier to loosen constraints later than to restore confidence after an incident. Start small, instrument aggressively, and make governance a feature—not an afterthought. Agentic automation can shift platform engineering from firefighting to shipping, but only when it’s delivered with predictable controls and measurable outcomes.