LLM News

Every LLM release, update, and milestone.

0
model release

UAE's TIIUAE releases Falcon Perception: 0.6B early-fusion model for open-vocabulary grounding

TIIUAE has released Falcon Perception, a 0.6B-parameter early-fusion Transformer that combines image patches and text in a single sequence for open-vocabulary object grounding and segmentation. The model achieves 68.0 Macro-F1 on SA-Co (vs. 62.3 for SAM 3) and introduces PBench, a diagnostic benchmark that isolates performance across five capability levels. TIIUAE also released Falcon OCR, a 0.3B model reaching 80.3 on olmOCR and 88.6 on OmniDocBench.

0
product updateAnthropic

Anthropic's Claude Code leak exposes Tamagotchi pet and always-on agent features

A source code leak in Anthropic's Claude Code 2.1.88 update exposed more than 512,000 lines of TypeScript, revealing unreleased features including a Tamagotchi-like pet interface and a KAIROS feature for background agent automation. Anthropic confirmed the leak was caused by a packaging error, not a security breach, and has since fixed the issue.

2 min readvia theverge.com
0
product updateAmazon Web Services

Amazon Bedrock AgentCore Evaluations now generally available for testing AI agents

Amazon Bedrock AgentCore Evaluations, a fully managed service for assessing AI agent performance, is now generally available following its public preview debut at AWS re:Invent 2025. The service addresses the core challenge that LLMs are non-deterministic—the same user query can produce different tool selections and outputs across runs—making traditional single-pass testing inadequate for reliable agent deployment.

3 min readvia aws.amazon.com
0
model releasexAI

xAI releases Grok 4.20 Multi-Agent with 2M context window and parallel agent reasoning

xAI has released Grok 4.20 Multi-Agent, a variant designed for collaborative agent-based workflows with a 2-million-token context window. The model scales from 4 agents at low/medium reasoning effort to 16 agents at high/xhigh effort levels, priced at $2 per million input tokens and $6 per million output tokens.

0
model releasexAI

xAI releases Grok 4.20 with 2M context window and native reasoning capabilities

xAI released Grok 4.20 on March 31, 2026, its flagship model featuring a 2 million token context window, $2 per million input tokens and $6 per million output tokens pricing, and toggleable reasoning capabilities. The model includes web search functionality at $5 per 1,000 queries and claims industry-leading speed with low hallucination rates.

2 min readvia openrouter.ai
0
product updateAmazon Web Services

AWS launches QA Studio: Natural language test automation powered by Amazon Nova Act

AWS has released QA Studio, a reference solution for QA automation built on Amazon Nova Act that enables teams to define tests in natural language rather than code. The system uses visual understanding to navigate applications like users do, automatically adapting to UI changes and eliminating maintenance overhead from traditional selector-based testing frameworks.

3 min readvia aws.amazon.com
0
product updateAmazon Web Services

Amazon's Alexa+ adds conversational food ordering with Uber Eats and Grubhub

Amazon has added conversational food ordering to Alexa+, its next-generation AI assistant, enabling users to order from Uber Eats and Grubhub through natural language. The feature rolls out today to Alexa+ customers with Echo Show 8 devices and larger, allowing users to browse menus, customize meals, and modify orders mid-conversation.

2 min readvia techcrunch.com
0
model release

IBM releases Granite 4.0 3B Vision, compact multimodal model for enterprise document understanding

IBM announced Granite 4.0 3B Vision, a 3 billion parameter vision-language model designed for enterprise document processing. The model achieves 86.4% on Chart2Summary and 92.1% TEDS score on cropped table extraction, shipped as a LoRA adapter on Granite 4.0 Micro to enable modular text-only fallbacks.

0
product update

Google Maps' Ask Maps feature now widely available in US and India with Gemini integration

Google has completed a wide rollout of Ask Maps, a Gemini-powered conversational feature in Google Maps, to all users in the US and India as of this week. The feature answers complex, real-world location-based questions that traditional map search cannot handle, such as finding romantic date spots, scenic running routes, and personalized travel recommendations.

1
model release

Alibaba's Qwen3.5-Omni learns to write code from speech and video without explicit training

Alibaba has released Qwen3.5-Omni, an omnimodal model handling text, images, audio, and video with a 256,000-token context window. The model reportedly outperforms Google's Gemini 3.1 Pro on audio tasks with support for 74 languages in speech recognition, a 6x increase from its predecessor. An unexpected emergent capability: writing working code from spoken instructions and video input, which the team did not explicitly train.

3 min readvia the-decoder.com
0
product updateGitHub

GitHub disables Copilot ads in pull requests after developer backlash

GitHub has disabled Copilot's ability to insert promotional tips into pull requests following developer backlash. The feature, which injected Raycast ads into over 11,400 pull requests without explicit developer consent, was enabled when Copilot was mentioned in PRs it didn't create. GitHub's product manager acknowledged the decision was a mistake.

2 min readvia go.theregister.com
0
model releaseMicrosoft

Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark

Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.