Breaking

OpenAI's GPT 5.4 ties Gemini 3.1 Pro at 72.4% on Google's Android coding benchmark

Google's Android Bench—a benchmark measuring AI model performance for Android app development—shows OpenAI's GPT 5.4 and Google's Gemini 3.1 Pro Preview tied at 72.4% in the latest April 2026 update. OpenAI's GPT 5.3-Codex ranks third at 67.7%, while Anthropic's Claude Opus 4.6 scores 66.6%.

April 9, 2026

Latest News

All news →
0
product update

Google Gemini adds interactive visualization generation with real-time parameter adjustment

Google Gemini can now generate interactive visualizations directly within the chat interface, allowing users to tweak variables, rotate 3D models, and explore data in real time. The feature activates through phrases like "show me" or "help me visualize" when using the Gemini Pro model. This follows Anthropic's Claude launch of similar interactive diagram capabilities in mid-March.

0
product updateAnthropic

Anthropic exits Claude Cowork research preview with enterprise features, launches Claude Managed Agents beta

Anthropic has promoted Claude Cowork from research preview to general availability, adding six enterprise features including role-based access controls, group spend limits, and usage analytics. The company simultaneously launched Claude Managed Agents in public beta—a composable API suite for building and deploying cloud-hosted agents without custom infrastructure work.

0
product updateAmazon Web Services

Amazon Bedrock AgentCore now supports stateful MCP with user input, LLM sampling, and progress streaming

Amazon has introduced stateful MCP client capabilities on Bedrock AgentCore Runtime, enabling agents to pause mid-execution for user input, request LLM-generated content, and stream real-time progress updates. The update transforms one-way tool execution into bidirectional conversations between MCP servers and clients, supporting interactive workflows previously impossible with stateless implementations.

2 min readvia aws.amazon.com
0
analysisOpenAI

OpenAI restricts cybersecurity AI access following Anthropic's model controls

OpenAI is restricting access to a new AI model with advanced cybersecurity capabilities to a small group of companies, mirroring Anthropic's decision to limit distribution of its Mythos Preview model. OpenAI's move builds on its February launch of the Trusted Access for Cyber pilot program following GPT-5.3-Codex, offering $10 million in API credits to participants.

2 min readvia the-decoder.com
0
model release

Meta releases Muse Spark, proprietary AI model after $14B hiring spree, but monetization path unclear

Meta released Muse Spark this week, its first major new AI model in over a year, marking a strategic shift from open-source Llama models to proprietary technology. The release comes nearly 10 months after Meta spent over $14 billion to hire Scale AI co-founder Alexandr Wang and establish Meta Superintelligence Labs. The critical question remains how Meta will monetize the model and compete with established rivals OpenAI, Anthropic, and Google.

3 min readvia cnbc.com
0
model releaseAnthropic

Anthropic withholds Claude Mythos Preview from public release due to autonomous cybersecurity exploit capabilities

Anthropic has declined to publicly release Claude Mythos Preview, its most capable AI model, citing critical cybersecurity risks. Instead, the company launched Project Glasswing, providing controlled access to 50+ organizations including AWS, Apple, Google, and Microsoft, along with $100 million in usage credits and $4 million in direct donations to open-source security initiatives.

0
model releaseZhipu AI

Zhipu AI's GLM-5.1 outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro through iterative strategy refinement

Zhipu AI has released GLM-5.1, a freely available open-weight model designed for long-running programming tasks that achieves 58.4% on SWE-Bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The model's core capability is iterative strategy refinement—it rethinks its approach across hundreds of iterations and thousands of tool calls, recognizing dead ends and shifting tactics without human intervention. However, GLM-5.1 trails on reasoning and knowledge benchmarks, scoring 31% on Humanity's Last Exam compared to Gemini 3.1 Pro's 45%.

3 min readvia the-decoder.com
0
product update

Google integrates NotebookLM research tool directly into Gemini app

Google has fully integrated NotebookLM, its AI research tool, into the Gemini app, allowing users to create notebooks with multiple source types directly within the chatbot. The feature supports PDFs, documents, website URLs, YouTube videos, and text input, with Gemini generating summaries, infographics, and audio overviews from uploaded sources.

2 min readvia engadget.com
0
product updateAnthropic

Anthropic launches Claude Managed Agents for autonomous AI agents in public beta

Anthropic has launched Claude Managed Agents as a public beta, providing developers with a hosted platform to build and run autonomous AI agents without managing their own infrastructure. The service costs $0.08 per session hour on top of standard token pricing and is currently available exclusively on Anthropic's infrastructure.

0
product update

Google Gemini adds notebooks feature to organize project files and conversations

Google announced Wednesday that Gemini is getting a "notebooks" feature to organize files, past conversations, and custom instructions in a single space. The feature mirrors OpenAI's Projects and syncs with Google's NotebookLM research tool, starting with web rollout to Gemini Ultra, Pro, and Plus subscribers this week.

2 min readvia theverge.com

Latest Models

All →