model-evaluation

3 articles tagged with model-evaluation

March 9, 2026
product updateOpenAI

OpenAI acquires Promptfoo to strengthen AI agent security capabilities

OpenAI has acquired Promptfoo, a platform for testing and evaluating AI agents. The acquisition signals frontier labs' intensifying focus on proving their technology can operate safely in critical business environments.

March 6, 2026
benchmark

Google benchmarks AI models for Android development; names top performers

Google has completed benchmarking tests to evaluate which AI models perform best for Android app development. The company released results identifying top-performing models across coding tasks specific to the Android platform.

February 20, 2026

Google DeepMind argues chatbot ethics require same rigor as coding benchmarks

Google DeepMind is pushing for moral behavior in large language models to be evaluated with the same technical rigor applied to coding and math benchmarks. As LLMs take on roles like companions, therapists, and medical advisors, the research group argues current evaluation standards are insufficient.