LLM News

Every LLM release, update, and milestone.

0
changelog

Google switches Gemini to compute-based limits, cuts AI Ultra to $100/month

Google is replacing Gemini's daily prompt limits with a compute-based system that factors in prompt complexity, features used, and chat length. Limits refresh every five hours until reaching a weekly cap. AI Ultra, aimed at developers and technical leads, now starts at $100/month—down from its previous entry point—with 5x higher usage limits than the Pro plan.

0
product update

Google DeepMind Integrates Street View With Genie 3 World Model for Real-World Environment Simulation

Google DeepMind launched Street View integration with its Genie 3 world model at I/O 2026, allowing users to simulate real-world locations from 280 billion images across 110 countries. The feature enables environment modification including weather changes and supports robotics training, with initial access for U.S. Ultra subscribers expanding globally.

0
product update

Google Search adds AI agents, generative UI, and conversational search box powered by Gemini 3.5 Flash

Google announced major Search updates at I/O 2026, including AI Mode now powered by Gemini 3.5 Flash serving over 1 billion monthly users. The company is launching background information agents that monitor the web 24/7 and generate custom mini-apps, both features reserved for Google AI Pro and Ultra subscribers.

0
model release

Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost

Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.

0
product update

Google Search deploys Gemini Flash 3.5 for AI-generated interfaces, agent-based information gathering

Google announced a fundamental restructuring of Search, replacing traditional ranked links with AI-generated interfaces powered by Gemini Flash 3.5. The update introduces information agents that monitor the web 24/7 and custom mini-apps built through natural language, with the generative UI rolling out free to all users this summer.

0
model release

Google Releases Gemini 3.5 Flash with 1M Token Context and Configurable Thinking Modes at $1.50/$9 Per Million Tokens

Google has released Gemini 3.5 Flash, a multimodal model with a 1 million token context window priced at $1.50 per million input tokens and $9 per million output tokens. The model supports text, image, video, audio, and PDF inputs with configurable thinking effort levels from minimal to high.

2 min readvia openrouter.ai
0
model release

Google releases Gemini 3.5 Flash at half the price of frontier models, announces Omni world model

Google released Gemini 3.5 Flash, priced at half to one-third the cost of comparable frontier models, and announced it will become the default model in the Gemini app globally. The company also unveiled Omni, a world model for simulating physical environments, and Gemini Spark, an AI agent in beta testing.

0
product updateAnthropic

Anthropic adds MCP tunnels and self-hosted sandboxes to Claude Managed Agents for enterprise security

Anthropic has added two enterprise security features to Claude Managed Agents: MCP tunnels, which route agent services through private networks without public internet exposure, and self-hosted sandboxes, which keep sensitive tool execution within customer infrastructure while Anthropic handles orchestration.

2 min readvia 9to5mac.com
0
product updateAmazon Web Services

AWS launches Nova Sonic voice agent framework with AgentCore Runtime and three integration patterns

AWS released Amazon Nova Sonic, a speech-to-speech foundation model for voice agents, alongside AgentCore Runtime, a serverless hosting environment with WebSocket streaming and microVM isolation. The framework supports three integration patterns: direct tool calls via AgentCore Gateway using Model Context Protocol (MCP), sub-agent delegation with Agent-to-Agent (A2A) protocol, and session segmentation for multi-step workflows.

0
product updateAmazon Web Services

Amazon Bedrock adds programmatic tool calling to reduce latency and token usage in multi-step workflows

Amazon Bedrock now supports programmatic tool calling (PTC), a technique that allows LLMs to generate Python code for multi-step tool orchestration rather than making sequential API calls. AWS offers three implementation paths: self-hosted Docker sandboxes on ECS, managed execution via Amazon Bedrock AgentCore Code Interpreter, and Anthropic SDK-compatible proxy integration.

0
model releaseByteDance

ByteDance releases Lance, 3B-parameter unified multimodal model handling image and video generation, editing, and unders

ByteDance has released Lance, a 3-billion parameter multimodal model that performs image and video generation, editing, and understanding within a single framework. The model was trained entirely from scratch using 128 A100 GPUs and achieves 84.67% on DPG-Bench and 74% on GenEval, competing with larger models despite its compact size.

2 min readvia huggingface.co
0
product update

SandboxAQ integrates physics-based drug discovery models into Claude via natural language interface

SandboxAQ has partnered with Anthropic to integrate its physics-based large quantitative models (LQMs) into Claude, making quantum chemistry calculations and molecular dynamics simulations accessible through natural language. The integration eliminates the need for specialized computing infrastructure previously required to run the models.

2 min readvia techcrunch.com
0
product updateAmazon Web Services

AWS publishes prompting guide for Amazon Nova 2 Lite content moderation using MLCommons taxonomy

AWS published a technical guide for prompting Amazon Nova 2 Lite for content moderation without fine-tuning. The approach uses the MLCommons AILuminate Assessment Standard's 12-category hazard taxonomy and includes XML/JSON structured prompts and few-shot learning examples for high-throughput moderation pipelines.

0
researchNVIDIA

NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data

NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.

2 min readvia huggingface.co
0
product updateAmazon Web Services

Amazon launches AI-generated podcast feature in Alexa+ with real-time news from AP, Reuters, Washington Post

Amazon launched Alexa Podcasts today in the U.S., a feature that generates custom podcast episodes on demand using AI-generated host voices. The company claims partnerships with the Associated Press, Reuters, The Washington Post, and over 200 local newspapers to improve content accuracy.

2 min readvia techcrunch.com
0
benchmark

IBM Research launches Open Agent Leaderboard, showing same models achieve different results based on agent architecture

IBM Research has launched the Open Agent Leaderboard, the first open benchmark that evaluates complete AI agent systems rather than just underlying models. The leaderboard reveals that agents using identical models can achieve significantly different success rates and costs depending on system architecture, with failed runs costing 20-54% more than successful ones.

2 min readvia huggingface.co
0
product updateAmazon Web Services

Amazon merges Rufus chatbot into Alexa for Shopping, adds price tracking and automated purchasing

Amazon has launched Alexa for Shopping, integrating its Rufus chatbot into the main shopping experience across its app, website, and Echo Show devices. The assistant is free for all signed-in US customers and includes price tracking, automated purchasing, and conversational shopping features. Rufus served over 300 million customers in 2025, according to Amazon.