InvokeGuardrailChecks – Amazon Bedrock Per-Turn Guardrails for Agentic AI, PII & Prompt Attacks

InvokeGuardrailChecks for agentic AI: lightweight, per‑turn safety in Amazon Bedrock Guardrails

Imagine a sales automation agent that drafts contracts and stitches together third‑party data: one careless tool response could leak a customer’s SSN to the wrong recipient. InvokeGuardrailChecks gives you a fast, targeted way to scan any step of an agent’s loop—before a tool call, after a tool returns data, or right before sending a reply—so you can detect prompt attacks, unsafe content, and exposed PII before it becomes a business problem.

TL;DR — What you get

InvokeGuardrailChecks is a resourceless, detect‑only API in Amazon Bedrock Guardrails built for agentic, multi‑turn AI.
Run focused safeguards anywhere in an agent loop (user input, plan, tool output, final reply) without creating guardrail resources.
Returns discrete severity/confidence scores for content filters, prompt attacks (jailbreak/prompt injection/leakage), and detection of 31 PII types with offsets.
It’s detect‑only: your application decides actions (block, retry, human review, log) using business thresholds.

Run individual safeguards at any point in an agent’s loop without creating guardrail resources.

What InvokeGuardrailChecks covers

Key safeguard categories and what you can expect:

Content filters: HATE, VIOLENCE, SEXUAL, INSULTS, MISCONDUCT — returned as numeric severity scores.
Prompt attack detection: JAILBREAK, PROMPT_INJECTION, PROMPT_LEAKAGE — exposed as standalone checks to target prompt attacks independently.
Sensitive information / PII detection: 31 PII entity types (EMAIL, PHONE, SSN, credit card, etc.) with confidence scores and character offsets to locate exposed spans.

Scores use a discrete set: {0, 0.2, 0.4, 0.6, 0.8, 1.0}. Treat them as simple, auditable buckets—0 = none, 0.2 = low, 0.4 = low‑medium, 0.6 = medium, 0.8 = high, 1.0 = critical—so you can define clear actions at each step.

How it fits into agentic AI

Agentic AI means multi‑turn systems that plan, call tools, and iterate internally. Each phase has a different risk profile: user prompts can carry prompt injection; tool calls can return sensitive data; generated plans can suggest unsafe actions. You don’t always want a global, one‑size‑fits‑all guardrail—sometimes you need a quick, inline scan.

InvokeGuardrailChecks is resourceless (no ARN or persistent guardrail lifecycle). Call it inline from your agent at any decision point and get back structured findings. Because it’s detect‑only, your application keeps control and applies context‑aware logic: block, retry, redact, send for human review, or log for audit.

Integration patterns and a practical example

Common places to call InvokeGuardrailChecks inside an agent loop:

Before a tool call: run JAILBREAK / PROMPT_INJECTION checks on generated prompts.
After tool output: run PII detection and redact or queue human review if sensitive spans are found.
Before final reply: run content filters and map severity to actions (send / modify / escalate).

Example Python (Boto3) flow — simplified:

from boto3 import client
bedrock = client('bedrock')

response = bedrock.invoke_guardrail_checks(
    InputText="User or tool text to check",
    Checks=[
        {"Type": "PII_DETECTION", "Entities": ["EMAIL", "SSN", "CREDIT_CARD"]},
        {"Type": "PROMPT_ATTACK", "Subtypes": ["JAILBREAK", "PROMPT_INJECTION"]},
        {"Type": "CONTENT_FILTER", "Subtypes": ["HATE", "VIOLENCE"]}
    ]
)

# Response contains findings with severity and confidence values (0..1 discrete steps)
# and, for PII, character offsets to the detected spans.

Sample JSON snippet (illustrative) returned as a finding:

{
  "Findings": [
    {
      "CheckType": "PII_DETECTION",
      "EntityType": "SSN",
      "Confidence": 0.8,
      "StartOffset": 124,
      "EndOffset": 135,
      "TextSnippet": "xxx-xx-1234"
    },
    {
      "CheckType": "PROMPT_ATTACK",
      "Subtype": "JAILBREAK",
      "Severity": 0.6
    },
    {
      "CheckType": "CONTENT_FILTER",
      "Subtype": "HATE",
      "Severity": 0.2
    }
  ]
}

Note: only the checks you request appear in results. Keep requests focused to reduce cost and latency.

IAM and security basics

Permission required: bedrock:InvokeGuardrailChecks. Because the API is resourceless, policies must use Resource: “*”. Narrow the blast radius with IAM condition keys:

Region restriction: aws:RequestedRegion = “us-east-1”
Network/source: aws:SourceIp or aws:SourceVpc
Principal constraints: aws:PrincipalTag (tag identities that are allowed)

Operationally, pair API permissions with network controls and monitoring. Log every finding to your observability pipeline (CloudWatch → Kinesis → SIEM) but ensure PII in logs is redacted or stored under strict retention policies required by compliance frameworks.

Latency, cost, and scale considerations

Each guardrail check is an extra API call and adds latency. Best practices to manage cost and performance:

Scope checks to riskiest steps—don’t run every check at every turn.
Batch or combine checks in a single call where possible rather than multiple sequential calls.
Use asynchronous checks for nonblocking signals: let the agent proceed while a background job audits lower‑risk outputs and triggers remediation if needed.
Measure end‑to‑end latency in staging. Budget a per‑turn latency allowance (e.g., 50–200ms per check) and adjust acceptance thresholds accordingly.
Model cost per call and run a cost/benefit analysis: how many calls per user per month are acceptable given your SLA and budget?

Calibration, testing, and monitoring

Don’t guess thresholds—calibrate them:

Shadow mode: run checks in production but don’t enforce actions. Collect findings, false positives, and false negatives.
Human review sampling: route middling scores (0.4–0.8) to reviewers and label outcomes.
Threshold sweep: run experiments to map severity/confidence values to desired actions (auto‑block, human review, log‑only).
A/B test UX impact: measure task completion, false block rates, and time to resolution.
Automated retraining or rules update: periodically update thresholds based on observed errors and changing risk profile.

Metrics to track:

False positive / false negative rates by check type and locale
Mean/median latency added per check
Number of human reviews triggered and average time to resolve
Incidents where PII leaked despite checks

When to use InvokeGuardrailChecks vs ApplyGuardrail

InvokeGuardrailChecks — Use when you need stateless, per‑step detection inside agent loops. Best for context‑aware decisions, ephemeral checks, and workflows where you prefer application control over enforcement.
ApplyGuardrail — Use when you want persistent guardrail resources that enforce actions automatically (block/mask/bypass) across requests with less application logic.

Many teams will combine both: ApplyGuardrail for baseline enforcement and InvokeGuardrailChecks for targeted, context‑aware checks inside agent flows.

Practical mini‑scenarios

Customer support bot (multi‑turn):

Before summarizing a user’s uploaded transcript, run PII_DETECTION on the transcript text. If SSN confidence ≥ 0.8, redact and route to human review.
Before sending a final reply, run CONTENT_FILTER; severity ≥ 0.8 triggers block and human escalation.

Sales assistant drafting contracts:

After populating contract fields with tool data, run PII_DETECTION for EMAIL and CREDIT_CARD; any match with confidence ≥ 0.6 requires redaction or encrypted storage and a manual approval step.

Research summarizer using external tools:

Before calling a web‑scraping tool, check prompts for PROMPT_INJECTION. On severity ≥ 0.6, sanitize input and retry with a stricter prompt template.

Limitations and mitigations

Be explicit about tradeoffs:

Detect‑only means you must implement enforcement, logging, and auditing; that’s extra operational work but gives flexibility.
Locale variance: PII detectors may perform differently across international formats. Validate and supplement with custom regexes or locale‑aware detectors where needed.
Adversarial actors may try to evade detectors. Combine detection with tool sandboxing, strict input types, and output sanitization.

Production checklist

Define checks required at each agent event (BeforeInvocation, AfterToolCall, AfterInvocation).
Create threshold map: for each check, define actions at each score bucket.
Implement IAM with minimal privileges and region/network conditions.
Set up logging and redaction rules for findings; stream to CloudWatch / your SIEM.
Run a 2–4 week shadow mode, collect labels, and tune thresholds.
Design human‑in‑the‑loop queues and SLA for review.
Monitor metrics and schedule periodic threshold reviews.

Frequently asked questions

Will InvokeGuardrailChecks block content automatically?

No. It’s detect‑only. The API returns numeric severity and confidence scores; your application decides whether to block, redact, retry, or escalate.

Can I limit who can call the API?

Yes. Use bedrock:InvokeGuardrailChecks in IAM with Resource: “*” and add condition keys (region, source IP/VPC, principal tags) to minimize risk.

How do I handle international PII formats?

Validate detectors on your international data, supplement with locale‑specific regexes, and treat middling scores as candidates for human review until you have confidence.

Is there unified logging/auditability?

InvokeGuardrailChecks returns structured findings you can log. Correlating findings across agent turns and tracing them through tool calls is your application’s responsibility—stream findings to CloudWatch or your observability stack for audit trails.

Glossary

Agentic AI: multi‑turn systems that autonomously plan, call tools, and iterate to accomplish tasks.
Resourceless: no persistent resource (ARN) to create/manage for the check—calls are inline and stateless.
Prompt injection / jailbreak: attacks where input manipulates the model to reveal sensitive data or ignore policies.
Severity / Confidence scores: discrete buckets {0, 0.2, 0.4, 0.6, 0.8, 1.0} used to trigger business rules.

Final thoughts

InvokeGuardrailChecks matches how modern agentic AI is built: many quick turns, diverse risk profiles, and a need for targeted detection. It shifts enforcement responsibility back to application teams, which is a tradeoff—more control, more operational work. If your agents touch regulated data or call external tools, plan for a calibration phase, robust logging, and a human‑in‑the‑loop mechanism. Use persistent ApplyGuardrail for baseline enforcement and InvokeGuardrailChecks for the contextual, per‑turn checks that agents need.

Authors / Contributors: Sandeep Singh, Denis Batalov, Shyam Srinivasan, and Koushik Kethamakka (contributors to the underlying work and launch).