Amazon Bedrock adds granular cost attribution — track who’s spending what on AI inference
- What changed: Amazon Bedrock now records the IAM principal (user, role session, or federated identity) that makes each inference call and exports that identity into the AWS Cost and Usage Report (CUR 2.0).
- Why it matters: Teams can roll up inference spend by user, application, team, project, or tenant in AWS Cost Explorer using iamPrincipal/* tags — enabling chargebacks, forecasting, and model-level cost optimization.
- First action: Enable IAM principal export in CUR 2.0 and activate iamPrincipal/* cost allocation tags (visible in Cost Explorer after ~24–48 hours).
Who should read this
Finance leads, platform engineering, SREs, product managers running AI features, and platform architects designing multi-tenant AI services. If inference costs are a growing line item on your cloud bill, this is for you.
Quick hook
The finance team opens the monthly cloud bill and sees a surprise spike in inference spend — but the bill shows only “Bedrock” with no user or app context. That lack of visibility is exactly what this change fixes: Bedrock now tags every inference call with the identity that made it, and those tags flow into CUR 2.0 and Cost Explorer for real accountability.
Plain-language definitions
- IAM principal: the identity making an API call — an IAM user, an assumed IAM role session, or a federated user authenticated by your Identity Provider (IdP).
- AssumeRole: a short-term AWS operation where a service or gateway temporarily takes on a role (session) to act on behalf of a user or application.
- CUR 2.0: AWS’s detailed billing export (Cost and Usage Report) which now includes a line_item_iam_principal field for Bedrock calls.
- Cost Explorer: the AWS console tool for filtering and grouping cloud spend (now supports iamPrincipal/* tags for Bedrock).
How it works — concise
Bedrock attaches the caller identity to each inference request and exports that identity into CUR 2.0 as line_item_iam_principal. Tags that come from IAM principal metadata or from session tags are exported with an iamPrincipal/ prefix and can be activated in Billing to appear in Cost Explorer. The line_item_usage_type field also encodes region, model name, and token direction (input/output), so you can compare per-model cost-per-token.
Amazon Bedrock automatically maps inference costs back to the IAM principal that made the call, and that attribution flows to AWS billing and CUR 2.0.
Sample CUR export (illustrative)
line_item_iam_principal: arn:aws:sts::123456789012:assumed-role/recommendation-service/role-session-name
iamPrincipal/team: analytics
line_item_usage_type: us-east-1|Claude-4.6|output
Four common deployment patterns — what you’ll see and what to change
-
IAM users / Bedrock API keys (developer & experiment stage)
Each developer or tester uses a distinct IAM principal or API key. Attribution is immediate and needs little setup. Good for small teams and experimentation.
-
IAM roles for applications (per-service billing)
Applications on EC2, Lambda, ECS, or EKS assume an IAM role. If you use separate roles per application or microservice, Bedrock will attribute spend to the assumed-role ARN — a simple way to split costs per service.
-
Federated users via IdP (per-user tracking without native IAM users)
Identity Providers (Okta, Azure AD/Entra ID, Auth0, Amazon Cognito, etc.) can pass role-session names and session tags in SAML/OIDC assertions. Those session tags export to CUR 2.0, enabling per-user chargebacks without creating long-lived IAM users.
-
LLM gateways and proxies (per-tenant / per-user billing with a gateway)
Gateways that call Bedrock from a single role will, by default, attribute all traffic to that gateway role. To get per-user or per-tenant visibility, the gateway must perform per-user AssumeRole operations with role-session-name and session tags, and then cache those session credentials.
LLM gateway implementation checklist
Gateways are the trickiest architecture for cost attribution. Follow these steps:
- Use AssumeRole per end-user or per-tenant so line_item_iam_principal reflects the session identity.
- Pass session tags (e.g., iamPrincipal/tenant, iamPrincipal/user, iamPrincipal/app) when creating the session; session tags are fixed for the session lifetime.
- Cache sessions for a reasonable TTL (common practice: up to 1 hour) to avoid excessive STS calls.
- Monitor STS rate limits — default AssumeRole is ~500 calls/sec per account; plan pooling or request quota increases for high concurrency.
- Avoid sending PII as session names; prefer hashed or internal IDs for privacy.
Example gateway session logic (high level): create or reuse a cached session for user X; if not present or expired, call AssumeRole with session tags and store the temporary credentials with an expiry timestamp; reuse until near-expiry, then refresh.
Tagging conventions — recommended keys & practices
Keep tags short, consistent, and privacy-safe. Use the iamPrincipal/ prefix for billing clarity:
- iamPrincipal/team = analytics
- iamPrincipal/app = recommendation-service
- iamPrincipal/tenant = tenant-1234 (use internal IDs — avoid emails)
- iamPrincipal/env = prod | staging | dev
For user identifiers prefer internal UUIDs or hashed values instead of raw email addresses or full names. Document the naming convention and store a mapping in a secure internal catalog for reconciliation.
Operational details and limits to plan for
- Tag visibility delay: iamPrincipal/* tags show in Cost Explorer about 24–48 hours after activation.
- No extra Bedrock charge: This attribution capability is available in commercial regions at no additional cost.
- STS rate limits: AssumeRole defaults to ~500 calls/sec/account; high-scale gateways should pool sessions or request quota increases.
- Session tags are immutable: once a session is created, session tags can’t be changed until the session expires.
- Data exported to billing: CUR now includes line_item_iam_principal and iamPrincipal/* tags — consider privacy and compliance when choosing session names.
Quick wins — first 5 things to do this week
- Enable IAM principal data export in CUR 2.0.
- Activate iamPrincipal/* cost allocation tags in Billing > Cost Allocation Tags (so they show in Cost Explorer).
- Create a short tagging policy (keys, allowed values, privacy rules) and share with platform teams.
- Pilot with one app or team: use a separate IAM role or per-user AssumeRole and validate Cost Explorer roll-ups after ~48 hours.
- Run a privacy review: remove PII from session names and decide on hashed user IDs for billing exports.
Common pitfalls and how to avoid them
- All traffic shows under one gateway role: implement per-user/tenant AssumeRole with session tags and cache sessions.
- Exceeding STS rates: pool or reuse sessions, increase quotas, or shard across accounts/regions as needed.
- PII leakage: never use raw email addresses or personal identifiers as session names in billing exports — use internal IDs or hashes.
- Forgetting to activate tags: tags appear in CUR but won’t show in Cost Explorer until you enable the iamPrincipal/* cost allocation tags.
- Mismatched tagging standards: agree on tag keys and values before scaling; inconsistent tags hurt reporting quality.
Practical cost model example (illustrative)
Suppose a model costs $0.01 per 1,000 tokens (illustrative). If an endpoint processes 10 million tokens in a month, that’s roughly $100 of model inference. Multiply by concurrent endpoints, multiple models, and many users, and inference spend can quickly become material. Granular attribution lets you find the runaway tenant, model, or feature and optimize or charge accordingly.
Alternatives and complements to CUR-level attribution
- Application-level request tagging in logs — good for real-time debugging and correlation with observability tools.
- Per-tenant API keys — simple but can be harder to rotate and manage at scale.
- Third-party cloud cost observability tools — can ingest CUR and provide richer dashboards, but attribution still needs accurate iamPrincipal metadata.
FAQ — short answers platform leaders ask
-
How long does setup take?
Enabling IAM principal export and activating tags can be done in hours. Expect ~24–48 hours to see data in Cost Explorer. Full rollout (tagging policy, gateway changes, privacy review) takes longer — plan a 2–4 week pilot.
-
What’s the expected ROI?
Visibility frequently uncovers inefficient models, runaway tenants, or untagged dev usage. Even a single discovery that avoids a multi-thousand-dollar monthly overrun pays for the operational effort.
-
Does this affect security posture?
Yes — making separate IAM roles per service improves isolation and serves both security and billing goals. However, gateways implementing per-user AssumeRole must secure short-lived credentials appropriately.
-
Is personal data exported to billing?
Session names and tags can contain identifiers. Avoid PII in session names and use hashed/internal IDs to stay compliant with privacy rules.
Rollout pattern (pilot → scale)
- Pilot one team: enable CUR principal export, apply tags, validate Cost Explorer reports.
- Refine tag conventions and update IAM/gateway code to include session tags where needed.
- Document chargeback/showback rules and run a finance review to map billing lines to internal accounting.
- Scale across teams and automate tag enforcement in CI/CD or IAM provisioning workflows.
Key takeaways
- Bedrock now exports caller identity to CUR 2.0, enabling per-user, per-application, and per-tenant inference billing.
- Use principal tags and session tags (iamPrincipal/*) to aggregate spend in Cost Explorer — activate the tags and allow 24–48 hours for visibility.
- LLM gateways need per-user AssumeRole + session tags and session caching to avoid attributing all spend to one role.
- Plan for STS rate limits, privacy controls, and clear tagging conventions before scaling.
Next steps
Enable IAM principal export in CUR 2.0, activate iamPrincipal/* cost allocation tags, and run a short pilot with one application or gateway. Create a one-page tagging policy that avoids PII and assigns responsibility for mapping billing lines to internal cost centers. If you want a practical checklist or a template tagging policy to hand to platform teams, treat this as the moment to convert unknown inference spend into actionable cost signals that finance and engineering can act on together.