reasoning-models
10 articles tagged with reasoning-models
Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning
Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.
Meta launches proprietary Muse Spark, abandoning open-source strategy after $14.3B rebuild
Meta launched Muse Spark on April 8, 2026, a natively multimodal reasoning model with tool-use and visual chain-of-thought capabilities. Unlike Llama, it is entirely proprietary with no open weights. The model scores 52 on AI Index v4.0 and excels on health benchmarks but represents Meta's departure from its open-source identity.
Meta replaces Llama with Muse Spark AI, launches Contemplating mode for complex reasoning
Meta has discontinued its Llama model line and launched Muse Spark as the foundation of its new AI strategy under Meta Superintelligence Labs. The model features a Contemplating mode for complex reasoning tasks and specializes in multimodal perception, health applications, and agentic tasks. Muse Spark is available today in Meta AI apps, with a private API preview for select partners.
Arcee releases Trinity Large Thinking, an open-source reasoning model built on $20M budget
Arcee, a 26-person U.S. startup, released Trinity Large Thinking, an open-source reasoning model it claims is the most capable open-weight model ever released by a non-Chinese company. Built on a $20 million budget, the model competes with other top open-source offerings while maintaining Apache 2.0 licensing, positioning itself as an alternative to both closed-source Western models and Chinese alternatives.
Alibaba's Qwen team develops algorithm that doubles reasoning chain length in math problems
Alibaba's Qwen team has developed Future-KL Influenced Policy Optimization (FIPO), a training algorithm that assigns different weights to tokens based on their influence on subsequent reasoning steps, rather than treating all tokens equally. Testing on Qwen2.5-32B-Base showed reasoning chains double from ~4,000 to 10,000+ tokens, with AIME 2024 accuracy improving from 50% to 58%, outperforming Deepseek-R1-Zero-Math-32B (47%) and OpenAI's o1-mini (56%). The team plans to open-source the system.
Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.
xAI releases Grok 4.20 Multi-Agent with 2M context window and parallel agent reasoning
xAI has released Grok 4.20 Multi-Agent, a variant designed for collaborative agent-based workflows with a 2-million-token context window. The model scales from 4 agents at low/medium reasoning effort to 16 agents at high/xhigh effort levels, priced at $2 per million input tokens and $6 per million output tokens.
NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains
NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.
DuckDuckGo adds GPT-5 mini and GPT-5.2 reasoning models to Duck.ai privacy chatbot
DuckDuckGo's Duck.ai chatbot platform now includes OpenAI's GPT-5 mini for free users and GPT-5.2 for subscribers, both with reasoning capabilities. The platform continues to anonymize all conversations by default, stripping metadata before routing chats to model providers including Anthropic, Meta, Mistral, and OpenAI.
Bytedance study: reasoning models know when to stop, but sampling methods force continued thinking
A new Bytedance study reveals that large reasoning models actually know when they've reached the correct answer, but common sampling methods prevent them from stopping. The models engage in unnecessary cross-checking and reformulation despite already solving problems correctly.