Evaluation

2 articles tagged with Evaluation

June 2, 2026
product updateMicrosoft

Microsoft releases ASSERT, open-source framework for testing application-specific AI behavior using natural language

Microsoft released ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), an open-source framework that converts natural language descriptions of expected AI behavior into structured test cases. The tool addresses a gap in AI evaluation by testing application-specific behaviors that general benchmarks cannot capture.

May 4, 2026
product updateAmazon AWS

AWS Launches AgentCore Optimization: Automated Performance Loop for Production AI Agents

Amazon Web Services released AgentCore Optimization in preview, introducing an automated performance loop that generates configuration recommendations from production traces, validates them through batch evaluation and A/B testing, and enables continuous agent optimization. The system targets the quality drift problem where AI agents degrade as models evolve and user behavior shifts.