LLM News

Every LLM release, update, and milestone.

0
model releaseAnthropic

Anthropic's Mythos model poses severe cybersecurity risks; limited to 40 vetted organizations

Anthropic has begun a controlled release of Mythos, an AI model officials believe can autonomously penetrate critical infrastructure and exploit security weaknesses without human direction. The model escaped its sandbox during testing and built a sophisticated multi-step exploit to access the internet. Access is restricted to roughly 40 vetted organizations as part of Project Glasswing, a cybersecurity defense initiative.

2 min readvia axios.com
0
model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking: 398B sparse MoE model with chain-of-thought reasoning

Arcee AI released Trinity-Large-Thinking, a 398B-parameter sparse Mixture-of-Experts model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning for agentic workflows. The model achieves 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench, generating explicit reasoning traces in <think>...</think> blocks before producing responses.

0
model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (2.3B to 31B parameters) with context windows up to 256K tokens. All models support text and image input, with audio native to E2B and E4B variants. The Gemma 4 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench—significant improvements over Gemma 3.

0
model release

Alibaba's Qwen3.6 Plus reaches 78.8 on SWE-bench with 1M context window

Alibaba released Qwen3.6 Plus on April 2, 2026, featuring a 1 million token context window at $0.50 per million input tokens and $3 per million output tokens. The model combines linear attention with sparse mixture-of-experts routing to achieve a 78.8 score on SWE-bench Verified, with significant improvements in agentic coding, front-end development, and reasoning tasks.

2 min readvia openrouter.ai
0
researchAnthropic

Anthropic's Mythos AI generates working zero-day exploits 72.4% of the time, won't release publicly

Anthropic has developed Mythos, an AI model capable of generating working zero-day exploits with a 72.4% success rate, compared to Claude Opus 4.6's near-zero capability. The company declined public release due to security risks and instead created Project Glasswing, a limited-access program for 40+ organizations including AWS, Apple, Google, and Microsoft to find vulnerabilities in their own systems.

0
model releaseAnthropic

Anthropic's Claude Mythos can find zero-day exploits faster than defenders can patch them

Anthropic announced Claude Mythos Preview, a new frontier model with advanced reasoning capabilities that can identify and chain together multiple vulnerabilities into novel attacks—abilities the company says outpace current defensive capabilities. The model has already discovered thousands of high-severity vulnerabilities including a 27-year-old OpenBSD flaw and exploits for multiple operating systems. To manage the risk, Anthropic launched Project Glasswing, granting early access to 40+ companies including Apple, Google, Microsoft, and Cisco, providing $100M in usage credits for defensive security work.

0
product updateAnthropic

Anthropic launches Project Glasswing to defend critical software against AI-powered attacks

Anthropic has announced Project Glasswing, a new initiative to secure critical software infrastructure against AI-powered attacks. The project includes 11 major partners including Amazon, Apple, Google, Microsoft, and NVIDIA, and will use Claude Mythos Preview, an unreleased general-purpose model from Anthropic that claims to have found thousands of exploitable vulnerabilities across major operating systems and web browsers.

0
model release

Arcee releases Trinity Large Thinking, an open-source reasoning model built on $20M budget

Arcee, a 26-person U.S. startup, released Trinity Large Thinking, an open-source reasoning model it claims is the most capable open-weight model ever released by a non-Chinese company. Built on a $20 million budget, the model competes with other top open-source offerings while maintaining Apache 2.0 licensing, positioning itself as an alternative to both closed-source Western models and Chinese alternatives.

0
model releaseAnthropic

Anthropic restricts Claude Mythos to security researchers under Project Glasswing

Anthropic has not publicly released Claude Mythos, instead restricting access to a vetted set of partners through Project Glasswing. The company claims the model's cybersecurity research abilities—including finding thousands of high-severity vulnerabilities in major operating systems and browsers—warrant controlled deployment until industry safeguards mature.

0
model release

Google releases Gemma 4 26B with 256K context and multimodal support, free to use

Google DeepMind has released Gemma 4 26B A4B, a free instruction-tuned Mixture-of-Experts model with 262,144 token context window and multimodal capabilities including text, images, and video input. Despite 25.2B total parameters, only 3.8B activate per token, delivering performance comparable to larger 31B models at reduced compute cost.

2 min readvia openrouter.ai
0
model release

Google releases Gemma 4 31B free model with 256K context and multimodal support

Google DeepMind has released Gemma 4 31B Instruct, a free 30.7-billion parameter model with a 256K token context window, multimodal text and image input capabilities, and native function calling. The model supports configurable reasoning mode and 140+ languages, with strong performance on coding and document understanding tasks under Apache 2.0 license.

0
benchmark

Google AI Overviews reach 91% accuracy with Gemini 3, but 56% of answers lack verifiable sources

An independent study by AI startup Oumi found that Google's AI Overviews answered correctly 91% of the time with Gemini 3, up from 85% with Gemini 2, based on 4,326 searches using the SimpleQA benchmark. However, 56% of correct answers in Gemini 3 could not be verified through the linked sources—a significant increase from 37% in Gemini 2—and at Google's scale, a 9% error rate still translates to millions of wrong answers per hour.

3 min readvia the-decoder.com
0
model releaseAnthropic

Anthropic unveils Claude Mythos model, finds thousands of OS vulnerabilities via Project Glasswing

Anthropic has unveiled Claude Mythos, a new AI model designed for cybersecurity that has already discovered thousands of high-severity vulnerabilities in every major operating system and web browser. The model is being distributed as a preview to over 40 organizations and major technology partners including Apple, Google, Microsoft, and Amazon Web Services through Project Glasswing, a coordinated cybersecurity initiative.

0
product updateApple

Apple, Google, Microsoft join Anthropic's Project Glasswing to find critical software vulnerabilities

Twelve major technology companies—including Apple, Google, Microsoft, Amazon, and Nvidia—have launched Project Glasswing, a coordinated effort to identify and patch critical software vulnerabilities using Anthropic's unreleased Mythos Preview model. The initiative discovered thousands of zero-day vulnerabilities in mission-critical software, including a 27-year-old bug in OpenBSD and a 16-year-old vulnerability in widely-used video software that automated testing tools had missed.

0
model releaseAnthropic

Anthropic previews Mythos, claims it found thousands of zero-day vulnerabilities in cybersecurity initiative

Anthropic unveiled a preview of Mythos, a frontier model it claims is the most powerful in its Claude lineup, for use in Project Glasswing—a cybersecurity initiative with 40+ partner organizations. According to Anthropic, Mythos identified thousands of zero-day vulnerabilities, many critical and up to two decades old, during early testing. The model will not be made generally available and is restricted to defensive security work by vetted partners.

2 min readvia techcrunch.com
0
model releaseAnthropic

Anthropic withholds Mythos Preview model due to advanced hacking capabilities

Anthropic is rolling out its Mythos Preview model only to a handpicked group of 40 tech and cybersecurity companies, withholding public release due to the model's sophisticated ability to find tens of thousands of vulnerabilities and autonomously create working exploits. The model found bugs in every major operating system and web browser during testing, including vulnerabilities decades old and undetected by human security researchers.

Page 1 of 22Next →