LLM News

Every LLM release, update, and milestone.

0
product update

Google Gemini app adds Extended thinking mode, prepares Canva, Instacart, OpenTable integrations

Google is rolling out a new "Thinking level" option in the Gemini app, allowing users to toggle between Standard and Extended modes when using Gemini 3 Flash or Gemini 3.1 Pro. The app is also preparing integrations with Canva, Instacart, and OpenTable, expanding beyond its current third-party connections.

0
research

Gemma 4, DeepSeek V4, and ZAYA1 Deploy KV Cache Compression to Cut Long-Context Memory Costs

Recent open-weight LLM releases from Google, DeepSeek, and others are adopting architectural techniques that reduce KV cache size by approximately 50% at long contexts. These include cross-layer KV sharing in Gemma 4, which saves 2.7 GB at 128K context for the E2B model, and compressed convolutional attention in ZAYA1-8B.

0
model releaseMicrosoft

Microsoft Releases Fara-7B: 7B Parameter Computer Use Agent Trained in 2.5 Days on 64 H100s

Microsoft Research has released Fara-7B, a 7-billion parameter small language model designed for computer automation tasks. The model, which took 2.5 days to train on 64 H100 GPUs, can navigate websites to complete tasks like booking restaurants and shopping, using screenshots as input with a 128K token context window.

2 min readvia huggingface.co
0
benchmark

Augment Code's agent matches Claude Code quality at 33% lower cost on Opus 4.7

Augment Code benchmarked its Auggie agent against Claude Code on Claude Opus 4.7, reporting a 67.4% pass rate versus 66.3% while cutting costs by 33%. The company attributes savings to a semantic context engine that reduces cache read tokens by 32% and output tokens by 37% compared to Claude Code's keyword-based retrieval.

0
product updateAnthropic

Anthropic launches contract review tool in Claude for Small Business that flags risky clauses

Anthropic has released Claude for Small Business, a collection of 31 AI skills for Claude Cowork subscribers. The standout feature is /review-contract, which analyzes legal contracts and flags problematic clauses in approximately five minutes. The tool requires at minimum a $20/month Claude Pro subscription.

2 min readvia zdnet.com
0
researchAnthropic

Security researchers use Anthropic's Mythos Preview to bypass Apple's M5 memory protection in 5 days

Security researchers at Calif used Anthropic's Mythos Preview model to develop a working macOS kernel memory corruption exploit on M5 silicon in five days, bypassing Apple's Memory Integrity Enforcement (MIE) system. The exploit chain targets macOS 26.4.1 and escalates from unprivileged local user to root shell using two vulnerabilities and several techniques.

3 min readvia 9to5mac.com
0
model releaseIbm

IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter

IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.

3 min readvia huggingface.co
0
analysisAnthropic

Anthropic's Mythos Preview solves previously unsolvable cybersecurity test in updated checkpoint

A month after its initial release, a newer checkpoint of Anthropic's Mythos Preview became the first model to complete the UK AI Safety Institute's 'Cooling Tower' cyber range, solving it in 3 of 10 attempts. The model also completed 'The Last Ones' range in 6 of 10 attempts, surpassing OpenAI's GPT-5.5 and demonstrating capability improvements within a single model version.

3 min readvia zdnet.com