product updateAmazon Web Services

AWS launches AgentCore Observability for Amazon Bedrock to debug production AI agents

TL;DR

Amazon Web Services launched AgentCore Observability for Amazon Bedrock, a debugging tool that provides visibility into AI agent execution through OpenTelemetry traces, CloudWatch metrics, and structured logs. The tool addresses silent failures in production agents including infinite reasoning loops, incorrect tool selection, and plausible but incorrect answers.

2 min read
0

AWS Launches AgentCore Observability for Amazon Bedrock

Amazon Web Services introduced AgentCore Observability for Amazon Bedrock, a debugging tool designed to identify failures in production AI agents that occur without triggering standard error alerts.

Core Capabilities

The tool provides visibility across three layers:

  • Metrics: Real-time monitoring through Amazon CloudWatch including session volume, latency, token usage, and error rates
  • Traces: OpenTelemetry-compliant distributed traces showing reasoning steps, tool invocations, memory retrievals, and outputs
  • Structured logs: Span-level logs capturing execution flow details

According to AWS, the telemetry routes to Amazon CloudWatch by default but can export to Datadog, Grafana Cloud, or Elastic Observability without additional instrumentation.

Target Failure Patterns

The service addresses three categories of production issues:

Quality failures: Completed tasks that return incorrect results, including hallucinations where agents reference non-existent policies or generate fabricated data. In multi-agent systems, these errors propagate when one agent's output feeds another agent's input.

Reliability issues: Workflow completion failures from tool invocation errors (401 authentication failures, 403 permission denials, 400 invalid input errors) and context loss where agents fail to retain session state.

Efficiency problems: High latency reducing user engagement, excessive token usage from verbose responses or unnecessary full document retrieval, and repeated tool calls instead of caching results.

Technical Implementation

The GenAI Observability dashboard displays metrics filterable by agent ID, session ID, or time range. CloudWatch alarms automatically notify users when latency exceeds thresholds or error rates spike.

Key metrics tracked include:

  • Performance: Latency at 50th, 95th, and 99th percentiles; separate measurement of memory retrieval time and tool response time
  • Resource usage: Session duration, concurrent sessions, input and output token counts
  • Reliability: Error rates broken down by authentication, authorization, validation, and timeout failures

Requirements

Users need an AWS account with Amazon Bedrock AgentCore access enabled, CloudWatch Transaction Search enabled, and appropriate IAM permissions. The tool requires a deployed AgentCore agent or deployment permissions.

What This Means

This launch addresses a critical gap in AI operations: production agents that fail silently without triggering traditional monitoring alerts. By providing execution-level traces showing decision sequences and tool selections, AWS gives developers visibility into where agent reasoning breaks down—moving beyond detecting that a failure occurred to understanding why it happened. The OpenTelemetry compatibility allows organizations using existing observability platforms to integrate without additional instrumentation work.

AWS indicated this is Part 1 of a two-part series, with Part 2 covering performance optimization and memory management. Pricing for the observability features was not disclosed.

Related Articles

product update

Mistral adds workspace-level connector controls, multi-account authentication, and debugging tools

Mistral AI released new enterprise connector features including workspace-level access controls, multi-account authentication for single connectors, and a debugging tool for Model Context Protocol (MCP) connections. The updates address production deployment challenges for AI agents accessing enterprise data systems.

product update

AWS demonstrates two-model pipeline using Nova 2 Lite and Claude Sonnet 4.6 that cuts document processing costs by 67%

AWS published a technical demonstration showing that pairing Amazon Nova 2 Lite with Anthropic's Claude Sonnet 4.6 reduces document processing costs by approximately two-thirds compared to single-model approaches. The two-stage pipeline processed 336 scanned yearbook pages at $0.0027 per page, producing 3,122 name-to-face associations with 93% scoring at or above 0.95 confidence.

product update

Cursor launches iOS app for mobile code review and agent management after SpaceX acquisition

Cursor has released its first iOS app for iPhone and iPad, allowing developers to launch coding agents, review pull requests, and manage engineering work from mobile devices. The app arrives weeks after SpaceX acquired the AI coding company in June 2026.

product update

AWS releases automated healthcare claims pipeline using Amazon Bedrock Data Automation and AgentCore

AWS has published a technical implementation guide for automating healthcare claims processing using Amazon Bedrock Data Automation and Amazon Bedrock AgentCore. The pipeline extracts data from CMS-1500 claim forms, validates against AWS HealthLake records, and generates FHIR-compliant claim resources with automated notifications.

Comments

Loading...