model-evaluation
4 articles tagged with model-evaluation
Popsa generates 5.5M personalized photo book titles using Amazon Nova, cuts costs with 73% user satisfaction
Popsa, a photo book service operating in 50+ countries, generated over 5.5 million AI-powered titles in 2025 using Amazon Nova models. The company achieved 73% positive user feedback with Nova Pro while reducing costs and latency compared to Claude 3 Haiku.
OpenAI acquires Promptfoo to strengthen AI agent security capabilities
OpenAI has acquired Promptfoo, a platform for testing and evaluating AI agents. The acquisition signals frontier labs' intensifying focus on proving their technology can operate safely in critical business environments.
Google benchmarks AI models for Android development; names top performers
Google has completed benchmarking tests to evaluate which AI models perform best for Android app development. The company released results identifying top-performing models across coding tasks specific to the Android platform.
Google DeepMind argues chatbot ethics require same rigor as coding benchmarks
Google DeepMind is pushing for moral behavior in large language models to be evaluated with the same technical rigor applied to coding and math benchmarks. As LLMs take on roles like companions, therapists, and medical advisors, the research group argues current evaluation standards are insufficient.