reasoning

45 articles tagged with reasoning

May 7, 2026

model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

May 7, 2026 · 4:21 PM

model release

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

May 7, 2026 · 12:35 AM

May 6, 2026

model release

Baidu Launches CoBuddy Code Generation Model with 131K Context Window, Free on OpenRouter

Baidu has released CoBuddy, a code generation model optimized for coding tasks and AI agent workflows. The model features a 131K token context window, up to 65K output tokens, and runs on fp8 quantization with native support for tool calling and reasoning.

May 6, 2026 · 3:05 AM

May 2, 2026

model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

May 2, 2026 · 9:06 PM

April 29, 2026

researchApple

Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy

Apple researchers, alongside UC San Diego, have published LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning, a framework that combines diffusion models with autoregressive generation. The system runs multiple reasoning paths in parallel during inference, each exploring different possibilities before generating a final answer.

April 29, 2026 · 10:50 PM

model releaseMistral AI

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

April 29, 2026 · 7:21 PM

model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

April 29, 2026 · 5:36 PM

model releaseNVIDIA+1

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

April 29, 2026 · 2:51 AM

April 28, 2026

model releaseNVIDIA

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

April 28, 2026 · 4:36 PM

model release

Poolside releases Laguna XS.2, free fp8-quantized coding agent with 128K context

Poolside has released Laguna XS.2, the second-generation model in its XS size class for agentic coding workflows. The model offers 128K context window, up to 8K output tokens, and is quantized to fp8 for efficiency, available free via OpenRouter.

April 28, 2026 · 3:20 PM

April 27, 2026

model releaseOpenAI

OpenAI Launches GPT Mini Latest with 400,000 Token Context Window

OpenAI released GPT Mini Latest on April 27, 2025, featuring a 400,000 token context window. The model automatically redirects to the latest version in the OpenAI GPT Mini family, allowing developers to stay current without manual updates.

April 27, 2026 · 7:52 PM

changelog

Google Launches Gemini Pro Latest Router with 1M+ Context Window

Google has released Gemini Pro Latest on OpenRouter, a dynamic model router that automatically redirects to the most current model in the Gemini Pro family. The router supports a 1,048,576-token context window and includes reasoning capabilities.

April 27, 2026 · 7:51 PM

changelog

Google Releases Gemini Flash Latest Router with 1M+ Token Context Window

Google released Gemini Flash Latest on April 27, 2026, a dynamic router that automatically redirects to the newest model in the Gemini Flash family. The model supports 1,048,576 token context window and includes reasoning capabilities.

April 27, 2026 · 7:51 PM

model release

Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window

Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.

April 27, 2026 · 3:50 AM

model release

Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing

Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.

April 27, 2026 · 3:05 AM

April 24, 2026

model releaseDeepSeek

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.

April 24, 2026 · 4:21 AM

model releaseDeepSeek

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.

April 24, 2026 · 3:36 AM

model releaseDeepSeek

DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context

DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.

April 24, 2026 · 3:21 AM

April 23, 2026

changelogAnthropic

Anthropic reverts three system changes that degraded Claude Code performance in March and April

Anthropic confirmed three separate system changes in March and April degraded Claude Code, Claude Agent SDK, and Claude Cowork performance. The company reduced default reasoning effort from high to medium on March 4, introduced a caching bug on March 26 that cleared session data with every turn, and added restrictive word limits on April 16 that caused a 3% performance drop.

April 23, 2026 · 11:36 PM

April 21, 2026

model releaseOpenAI

OpenAI releases ChatGPT Images 2.0 with integrated reasoning and text-image composition

OpenAI has released ChatGPT Images 2.0, which integrates reasoning capabilities to generate complex visual compositions combining text and images. The model supports aspect ratios from 3:1 to 1:3 and outputs up to 2K resolution, with advanced features available to Plus, Pro, Business, and Enterprise users.

April 21, 2026 · 7:21 PM

April 15, 2026

benchmarkOpenAI

OpenAI GPT-5.4 Pro reportedly solves Erdős problem #1196 in 80 minutes, reveals novel mathematical connection

OpenAI's GPT-5.4 Pro model has reportedly solved Erdős open problem #1196 in approximately 80 minutes, with another 30 minutes to format the solution as a LaTeX paper. Mathematician Terence Tao notes the solution reveals a previously undescribed connection between integer anatomy and Markov process theory.

April 15, 2026 · 10:20 AM

April 12, 2026

model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking, open reasoning model matching Claude Opus on agent tasks

Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-weight reasoning model with a mixture-of-experts architecture that activates only 13 billion parameters per token. The model matches Claude Opus 4.6 on agent benchmarks like Tau2 and PinchBench but lags on general reasoning tasks. The company spent approximately $20 million—roughly half its total venture capital—to train the model on 2,048 Nvidia B300 GPUs over 33 days.

April 12, 2026 · 9:05 AM

April 8, 2026

model release

Meta launches Muse Spark model with private API preview and 16 integrated tools

Meta announced Muse Spark today, its first model release since Llama 4 a year ago. The hosted model is available in private API preview and on meta.ai with Instant and Thinking modes, benchmarking competitively against Anthropic's Opus 4.6 and Google's Gemini 3.1 Pro, though behind on Terminal-Bench 2.0.

April 8, 2026 · 11:21 PM

model release

Meta releases Muse Spark, first closed-source model from Meta Superintelligence Labs

Meta has released Muse Spark, the first model from Meta Superintelligence Labs, a unit assembled under chief AI officer Alexandr Wang following Meta's $14.3 billion investment in Scale AI. The natively multimodal model features a "Contemplating" reasoning mode with parallel sub-agents and marks Meta's break from its open-source Llama heritage by operating as closed source.

April 8, 2026 · 8:20 PM

model release

Meta launches Muse Spark, its first frontier model and first closed-weight AI system

Meta Superintelligence Labs has launched Muse Spark, a native multimodal reasoning model that scores 52 on the Artificial Analysis Intelligence Index, placing it in the top 5 frontier models. This marks Meta's first frontier-class model and its first AI system without open weights, representing a strategic shift from its open-source Llama strategy. The model achieves comparable efficiency to Gemini 3.1 Pro while matching Llama 4 Maverick capabilities with over an order of magnitude less compute.

April 8, 2026 · 6:05 PM

model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking: 398B sparse MoE model with chain-of-thought reasoning

Arcee AI released Trinity-Large-Thinking, a 398B-parameter sparse Mixture-of-Experts model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning for agentic workflows. The model achieves 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench, generating explicit reasoning traces in <think>...</think> blocks before producing responses.

April 8, 2026 · 7:20 AM

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (2.3B to 31B parameters) with context windows up to 256K tokens. All models support text and image input, with audio native to E2B and E4B variants. The Gemma 4 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench—significant improvements over Gemma 3.

April 8, 2026 · 5:50 AM

April 6, 2026

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family: multimodal models from 2.3B to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active parameters), and 31B dense. All models support text and image input with 128K-256K context windows; E2B and E4B add native audio capabilities. Models feature reasoning modes, function calling, and multilingual support across 140+ languages.

April 6, 2026 · 9:05 AM

April 4, 2026

model releaseTencent

Tencent releases OmniWeaving, open-source video generation model with reasoning and multi-modal composition

Tencent's Hunyuan team released OmniWeaving on April 3, 2026, an open-source video generation model designed to compete with proprietary systems like Seedance-2.0. The model combines multimodal composition, reasoning-informed capabilities, and supports eight video generation tasks including text-to-video, image-to-video, video editing, and compositional generation.

April 4, 2026 · 10:20 AM

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with multimodal reasoning and up to 256K context window

Google DeepMind released Gemma 4, a multimodal model family supporting text, images, video, and audio with context windows up to 256K tokens. The release includes four sizes (E2B, E4B, 26B A4B, and 31B) designed for deployment from mobile devices to servers. The 31B dense model achieves 85.2% on MMLU Pro and 89.2% on AIME 2026.

April 4, 2026 · 12:50 AM

April 3, 2026

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.

April 3, 2026 · 6:35 AM

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning

Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.

April 3, 2026 · 4:05 AM

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning

Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.

April 3, 2026 · 12:05 AM

April 2, 2026

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

April 2, 2026 · 8:50 PM

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

April 2, 2026 · 6:20 PM

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

April 2, 2026 · 6:05 PM

model release

Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities

Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.

April 2, 2026 · 5:05 PM

March 31, 2026

model releasexAI

xAI releases Grok 4.20 with 2M context window and native reasoning capabilities

xAI released Grok 4.20 on March 31, 2026, its flagship model featuring a 2 million token context window, $2 per million input tokens and $6 per million output tokens pricing, and toggleable reasoning capabilities. The model includes web search functionality at $5 per 1,000 queries and claims industry-leading speed with low hallucination rates.

March 31, 2026 · 7:20 PM

March 27, 2026

model releaseAnthropic+1

Anthropic confirms leaked model represents major reasoning advance after security breach

A data breach at Anthropic exposed internal documents detailing an unreleased AI model the company describes as its most powerful to date. Anthropic confirmed it is already testing the model with select customers, claiming significant advances in reasoning, coding, and cybersecurity. The breach resulted from a misconfiguration in Anthropic's content management system that automatically made ~3,000 uploaded files publicly accessible.

March 27, 2026 · 9:05 AM

March 23, 2026

model releaseNVIDIA+1

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

March 23, 2026 · 3:35 PM

March 17, 2026

model release

Mistral AI releases Mistral Small 4, claims improved performance on reasoning tasks

Mistral AI has released Mistral Small 4, the latest iteration of its small-scale language model. The company claims improvements in reasoning and coding capabilities, though specific benchmark scores and pricing details have not been publicly disclosed.

March 17, 2026 · 3:13 PM

March 7, 2026

benchmarkOpenAI

Video AI models hit reasoning ceiling despite 1000x larger dataset, researchers find

An international research team released the largest video reasoning dataset to date—roughly 1,000 times larger than previous alternatives. Testing reveals that state-of-the-art models including Sora 2 and Veo 3.1 substantially underperform humans on reasoning tasks, suggesting the limitation isn't data scarcity but architectural constraints.

March 7, 2026 · 8:50 AM

February 27, 2026

model releaseDeepSeek

DeepSeek releases R1 reasoning model with chain-of-thought capabilities

DeepSeek has released DeepSeek-R1, a text generation model featuring reasoning capabilities through chain-of-thought processing. The model was published January 20, 2025 and has accumulated over 830,000 downloads on Hugging Face.

February 27, 2026 · 11:05 AM

February 24, 2026

model release

Inception's Mercury 2 uses diffusion for language reasoning, claims 5x speed over autoregressive models

Inception has released Mercury 2, positioning it as the first diffusion-based language reasoning model. Rather than generating text sequentially word-by-word like standard language models, Mercury 2 refines entire passages in parallel, according to the company.

February 24, 2026 · 7:50 PM

February 20, 2026

model release

Google announces Gemini 3.1 Pro for complex problem-solving tasks

Google announced Gemini 3.1 Pro, positioning the model for complex problem-solving tasks requiring deeper reasoning than previous versions. The release follows Gemini 3 Pro (November 2025) and Gemini 3 Flash (December 2025).

February 20, 2026 · 4:38 AM

← Back to all news