llm-testing

1 article tagged with llm-testing

March 31, 2026
product updateAmazon Web Services

Amazon Bedrock AgentCore Evaluations now generally available for testing AI agents

Amazon Bedrock AgentCore Evaluations, a fully managed service for assessing AI agent performance, is now generally available following its public preview debut at AWS re:Invent 2025. The service addresses the core challenge that LLMs are non-deterministic—the same user query can produce different tool selections and outputs across runs—making traditional single-pass testing inadequate for reliable agent deployment.