testing

5 articles tagged with testing

May 28, 2026
product updateAmazon Web Services

AWS launches dataset management in Bedrock AgentCore for versioned agent test suites

Amazon Web Services introduced dataset management in Bedrock AgentCore, enabling developers to build versioned test suites with immutable baselines for agent evaluation. The feature supports predefined scenarios with ground truth assertions and user simulation scenarios where LLM-backed actors conduct multi-turn conversations.

May 6, 2026
researchGitHub

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

April 24, 2026
model releaseOpenAI

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.

March 31, 2026
product updateAmazon Web Services

AWS launches QA Studio: Natural language test automation powered by Amazon Nova Act

AWS has released QA Studio, a reference solution for QA automation built on Amazon Nova Act that enables teams to define tests in natural language rather than code. The system uses visual understanding to navigate applications like users do, automatically adapting to UI changes and eliminating maintenance overhead from traditional selector-based testing frameworks.

March 9, 2026
product updateOpenAI

OpenAI acquires Promptfoo to strengthen AI agent security capabilities

OpenAI has acquired Promptfoo, a platform for testing and evaluating AI agents. The acquisition signals frontier labs' intensifying focus on proving their technology can operate safely in critical business environments.