Serverless MCP Proxy on Amazon Bedrock AgentCore: Governance & PII Redaction for AI Agents

Serverless MCP proxy on Amazon Bedrock AgentCore Runtime — a governance layer for AI agents

TL;DR: As AI agents orchestrate more business tasks, the Model Context Protocol (MCP) calls they make become critical control points. Deploying a serverless MCP proxy on Amazon Bedrock AgentCore Runtime gives platform teams a lightweight interception point to enforce policy, redact PII, and centralize observability without changing agents or upstream services.

Why platform teams should care

Picture a regulated finance team that wants audit trails and PII redaction for every tool call made by an assistant agent. Rewriting dozens of upstream services or hundreds of agents is slow and risky. A serverless MCP proxy acts like a security checkpoint: it mirrors the upstream tool catalog, inspects each tools/call, applies policies (tokenization, access checks, rate limits), and forwards requests. That lets security, compliance, and platform owners centralize governance while teams iterate on agent behavior.

Quick definitions

  • Model Context Protocol (MCP): a protocol that defines how agents discover and invoke external tools (tool catalog, tools/list and tools/call endpoints).
  • Amazon Bedrock AgentCore Runtime: a managed, serverless runtime for running agents and MCP-compatible containers with autoscaling, CloudWatch/OpenTelemetry integration, and built-in AgentCore identity.
  • FastMCP: a lightweight library used by the sample proxy to implement MCP discovery and forwarding.
  • SigV4 / IAM: AWS Signature Version 4 signing for role-based authentication; used when the proxy needs to authenticate to AWS-hosted upstream endpoints.
  • OAuth2 / JWT (client credentials): token-based bearer authentication used when upstream endpoints expect OAuth tokens (for example via Amazon Cognito).

Pattern overview: what the proxy does

The proxy is a stateless container that runs on AgentCore Runtime. On startup it calls tools/list against an upstream MCP server (the walkthrough uses an AgentCore Gateway), registers a matching local view of the tool catalog, and then forwards tools/call traffic while invoking pre- and post-forwarding hooks. These hooks are where you implement validation, tokenization, access control, logging, or rate limiting.

The proxy gives you a programmable interception point to implement validation, transformation, and filtering on every MCP tool invocation.

Simple flow (logical):

Agent -> Proxy on AgentCore Runtime -> Upstream MCP (AgentCore Gateway) -> Downstream tool

Architecture notes

  • Stateless container on AgentCore Runtime: autoscaling and managed observability (CloudWatch + OpenTelemetry).
  • Discovery: proxy issues tools/list to upstream, mirrors and exposes the same tool catalog locally.
  • Forwarding: proxy forwards tools/call requests and responses, wrapping them with pre/post logic.
  • Trust boundaries: agent→proxy, proxy→upstream, and upstream→downstream each enforce separate authentication and authorization.

Authentication and trust: SigV4 vs OAuth2

Two common modes for proxy→upstream authentication are supported:

  • IAM (SigV4) — the proxy runs with an execution role on AgentCore Runtime and signs outbound requests with AWS Signature Version 4. This is ideal when upstream is AWS-hosted or you want role-based access control. Typical permissions to consider include bedrock-agentcore:InvokeGateway or bedrock-agentcore:InvokeAgentRuntime.
  • OAuth2 / JWT (client credentials) — the proxy performs a client credentials grant against an OAuth provider (e.g., Amazon Cognito), caches tokens in memory, and sends Authorization: Bearer headers to upstream endpoints that expect tokens.

Example pseudocode for SigV4 signing (conceptual):

# Python / boto3 concept
session = boto3.Session()
credentials = session.get_credentials()
signed_request = sigv4_sign(
    method="POST",
    url=upstream_url,
    headers=headers,
    body=json_body,
    credentials=credentials,
    service="bedrock-agentcore"
)
send(signed_request)

Example OAuth client credentials flow (conceptual):

token = oauth_client.request_token(client_id, client_secret, scope)
cache.store(token, expires_at)
headers["Authorization"] = "Bearer " + token.access_token
forward_request(headers, body)

Deployment & quick start

Prerequisites:

  • Linux or macOS with Python 3.12+
  • AWS CLI configured with an IAM user that can create roles, ECR repos, and invoke Bedrock AgentCore APIs
  • Docker and ECR for building the container image
  • AgentCore starter toolkit
  • An upstream MCP endpoint (AgentCore Gateway is used in the walkthrough)

Quick start commands (high level):

  • git clone sample-mcp-proxy-agentcore-runtime
  • edit deploy_config.json to set names, ARNs, and OAuth settings
  • python setup_and_deploy.py –config deploy_config.json
  • Use the included test_agent.py (Strands Agents) to validate discovery and calls

Cleanup is similarly scripted: use agentcore destroy to remove the AgentCore agent and ECR images, delete IAM roles/policies via the AWS CLI, and remove the AgentCore Gateway with agentcore gateway delete-mcp-gateway.

Business use cases: PII redaction and tool-level access control

Two practical examples that map directly to compliance and governance goals.

Tokenization / PII redaction

Problem: agents send sensitive customer data to downstream tools that are not allowed to store raw PII.

Solution: implement a pre-forward hook that replaces PII with tokens and a post-forward hook that detokenizes or masks sensitive fields in responses. This lets you keep sensitive data out of downstream logs and services without changing agents.

# conceptual pre-forward hook
if contains_pii(request.args):
    tokens = tokenize(request.args)
    request.args = replace_with_tokens(request.args, tokens)
forward_to_upstream(request)

# conceptual post-forward hook
response = upstream_response
response = detokenize(response, tokens_lookup)
return response

Validation: include an automated audit test that sends known PII and asserts it does not appear in upstream request logs or in stored artifacts.

Tool-level access control

Problem: different business units must only access a subset of tools.

Solution: filter the mirrored catalog per caller identity and enforce per-call authorization. Expose only allowed tools in tools/list responses for each caller and reject unauthorized calls at call time.

# conceptual catalog filter
catalog = upstream_tools_list()
allowed = filter_by_caller_identity(catalog, caller)
return allowed

Observability, testing, and acceptance criteria

What to capture in logs and traces (suggested schema): request_id, timestamp, caller_identity, tool_name, pre_transform_hash, latency_ms, upstream_status, result_status, policy_decision, redaction_tokens_ids.

Recommended metrics:

  • P95 and P99 added latency (proxy ingress → upstream response)
  • requests/sec per tool
  • error rate (4xx/5xx) per tool and per caller
  • tokenization hits/misses
  • cache hit ratio (if caching is used)

Tracing: propagate a request_id in headers and use OpenTelemetry to link proxy spans to upstream spans so audits can reconstruct an end-to-end trace.

Testing checklist:

  • Unit tests for pre/post hooks (tokenization, ACL logic)
  • Integration tests using test_agent.py to verify discovery and call paths
  • Audit tests asserting PII does not appear in upstream logs
  • Load test to measure latency/throughput impact at expected production concurrency

Suggested acceptance criteria for production readiness (example starting points):

  • Median added latency < 50 ms for simple forwarding (measure per tool)
  • 99.9% success rate under expected load
  • Automated end-to-end tokenization tests passing in CI

Operational trade-offs and design considerations

  • Extra network hop: adds latency and another failure surface. Measure and budget latency; add caching or co-locate proxies where needed.
  • Complexity: the proxy is responsible for correctness of transformations—errors here have compliance impact.
  • Statelessness vs stateful needs: the default pattern is stateless. For streaming or long-lived sessions, attach a state store (e.g., DynamoDB/Redis) or hand off to a streaming-capable endpoint and record the transfer in proxy logs.
  • Scaling & cost: AgentCore Runtime autoscaling reduces ops overhead, but track ECR storage, invocation costs, data transfer, and CloudWatch/OpenTelemetry charges.
  • Chaining: proxies can chain to upstream MCP servers for hybrid or multi-cloud patterns, but ensure consistent logging and correlated request IDs across chains.

When to use — and when not to

Use this pattern when you need to:

  • Rapidly add governance (auditing, tokenization, access control) without changing agents or upstream services
  • Centralize compliance logic and reuse existing libraries
  • Leverage managed observability and identity on Bedrock AgentCore Runtime

Avoid or augment this pattern when:

  • Your use case requires ultra-low-latency (sub-ms) or very high-throughput without the ability to tolerate an extra hop
  • You require heavy stateful streaming where a stateless proxy design is insufficient (consider state stores or streaming handoffs)
  • You cannot accept the operational responsibility for tokenization/detokenization correctness inside the proxy

Key questions for platform and security teams

  • How can I insert governance without changing upstream servers or clients?

    Deploy a serverless MCP proxy on AgentCore Runtime that mirrors the upstream tool catalog and intercepts calls to apply validation, transformation, and policy enforcement.

  • Which authentication mode should I use between proxy and upstream?

    Use IAM SigV4 when you want role-based AWS identity and tight AWS integration; use OAuth2/JWT client credentials (e.g., Amazon Cognito) when upstream endpoints require bearer tokens. Both approaches are supported; tokens should be cached and refreshed appropriately for performance.

  • What are concrete customization examples to deploy quickly?

    Tokenization/PII redaction, tool-level access control (catalog filtering and per-call policies), logging, and rate-limiting hooks. Implement these in pre/post forwarding hooks and validate with automated tests.

  • What operational trade-offs should I budget for?

    Expect added latency from the extra hop, configuration complexity for IAM/OAuth, and responsibility for transformation correctness. For stateful or streaming workflows, augment the proxy with a state store or dedicated streaming endpoints.

  • How do I validate this quickly?

    Clone sample-mcp-proxy-agentcore-runtime, run setup_and_deploy.py with deploy_config.json, and validate end-to-end with the included Strands Agents test_agent.py. Add automated PII audit tests and load tests for latency profiling.

FAQ

Will the proxy break existing agents?
Not if it mirrors the upstream tool catalog accurately. The proxy registers identical tools at startup so clients see the same interfaces; changes in catalog should be handled by periodic resyncs or event-driven updates.
How do I measure the latency impact?
Instrument the proxy ingress and egress with request IDs and record timing. Compare P95/P99 with and without the proxy under representative load. Consider caching tokenization results and minimizing per-call processing for latency-sensitive tools.
How should audit logs be structured?
Use a consistent schema (request_id, caller_identity, tool_name, pre_transform_mask, policy_decision, latency_ms). Use OpenTelemetry to correlate traces across proxy and upstream systems.
What if I need streaming or long-lived sessions?
Either attach a state store (DynamoDB, Redis) or design the proxy to hand off long-lived streams to specialized endpoints. Record handoff identifiers in logs for auditing.

Appendix: sample IAM policy (sanitized)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock-agentcore:InvokeGateway",
        "bedrock-agentcore:InvokeAgentRuntime"
      ],
      "Resource": "*"
    }
  ]
}

Next steps

Clone sample-mcp-proxy-agentcore-runtime, inspect the Python proxy that uses FastMCP and boto3, and run the automated setup_and_deploy.py with your deploy_config.json to get a working demo. Start with tokenization hooks and a basic catalog filter to demonstrate value to security and compliance stakeholders before expanding to richer transformations.

Author: Nizar Kheir, Senior Solutions Architect (AWS, EMEA public sector). The sample implementation and deployment scripts are available as sample-mcp-proxy-agentcore-runtime on GitHub. For platform teams evaluating this pattern, consider a short pilot that measures added latency, validates PII redaction, and maps audit logs to compliance requirements.